Add initializeShadow() to TerrainRenderer that creates a depth-only
shadow pipeline reusing the existing shadow.vert/frag shaders (same
path as WMO/M2/character renderers). renderShadow() draws all terrain
chunks with sphere culling against the shadow coverage radius. Wire
both init and draw calls into Renderer so terrain now casts shadows
alongside buildings and NPCs.
The boneDescPool_ had MAX_BONE_SETS=2048 but sets were never freed when
instances were removed (only when clear() reset the whole pool on map load).
As tiles streamed in/out, each new animated instance consumed 2 pool slots
(one per frame index) permanently. After ~1024 animated instances created
total, vkAllocateDescriptorSets began failing silently and returning
VK_NULL_HANDLE. render() skips instances with null boneSet[frameIndex],
making them invisible — appearing as per-frame flicker as the culling pass
included them but the render pass excluded them.
Fix: destroyInstanceBones() now calls vkFreeDescriptorSets() for each
non-null boneSet before destroying the bone SSBO. The pool already had
VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT set for this purpose.
Also increased MAX_BONE_SETS from 2048 to 8192 for extra headroom.
texture.hpp / texture.cpp implemented an unfinished OpenGL texture loader
(loadFromFile was a TODO stub) that had no callers — the project's texture
loading is entirely handled by VkTexture (vk_texture.hpp/cpp) after the
Vulkan migration. Remove both files and their CMakeLists entries.
- AmdFsr3Runtime now probes both the legacy ffxFsr3* API and the newer
generic ffxCreateContext/ffxDispatch API; selects whichever the loaded
runtime library exports (GenericApi takes priority fallback)
- Generic API path implements full upscale + frame-generation context
creation, configure, dispatch, and destroy lifecycle
- dlopen error captured and surfaced in lastError_ on Linux so runtime
initialization failures are actionable
- FSR3 runtime init failure log now includes path kind, error string,
and loaded library path for easier debugging
- tools/generate_ffx_sdk_vk_permutations.sh added: auto-bootstraps
missing VK permutation headers; DXC auto-downloaded on Linux/Windows
MSYS2; macOS reads from PATH (CI installs via brew dxc)
- CMakeLists: add upscalers/include to probe include dirs, invoke
permutation script before SDK build, scope FFX pragma/ODR warning
suppressions to affected TUs, add runtime-copy dependency on wowee
- UI labels updated from "FSR2" → "FSR3" in settings, tuning panel,
performance HUD, and combo boxes
- CI macOS job now installs dxc via Homebrew for permutation codegen
- Motion vectors: single unjittered reprojection matrix (80 bytes) instead of
two jittered matrices (160 bytes), eliminating numerical instability from
jitter amplification through large world coordinates
- Sharpen pass: fix Y-flip for correct UV sampling, double-buffer descriptor
sets to avoid race with in-flight command buffers
- MSAA: auto-disable when FSR2 enabled, grey out AA setting in UI
- Accumulation: variance-based neighborhood clamping in YCoCg space,
correct history layout transitions
- Frame index: wrap at 256 for stable Halton sequence
Full FSR 2.2 pipeline with depth-based motion vector reprojection,
temporal accumulation with YCoCg neighborhood clamping, and RCAS
contrast-adaptive sharpening.
Architecture (designed for FSR 3.x frame generation readiness):
- Camera: Halton(2,3) sub-pixel jitter with unjittered projection
stored separately for motion vector computation
- Motion vectors: compute shader reconstructs world position from
depth + inverse VP, reprojects with previous frame's VP
- Temporal accumulation: compute shader blends 5-10% current frame
with 90-95% clamped history, adaptive blend for disocclusion
- History: ping-pong R16G16B16A16 buffers at display resolution
- Sharpening: RCAS fragment pass with contrast-adaptive weights
Integration:
- FSR2 replaces both FSR1 and MSAA when enabled
- Scene renders to internal resolution framebuffer (no MSAA)
- Compute passes run between scene and swapchain render passes
- Camera cut detection resets history on teleport
- Quality presets shared with FSR1 (0.50-0.77 scale factors)
- UI: "Upscaling" combo with Off/FSR 1.0/FSR 2.2 options
Root cause: bonesDirty was a single bool shared across both
double-buffered frame indices. When bones were copied to frame 0's
SSBO and bonesDirty cleared, frame 1's newly-allocated SSBO would
contain garbage/zeros and never get populated — causing animated
M2 instances to flash invisible on alternating frames.
Fix: Make bonesDirty per-frame-index (bool[2]) so each buffer
independently tracks whether it needs bone data uploaded. When
bones are recomputed, both indices are marked dirty. When uploaded
during render, only the current frame index is cleared. New buffer
allocations in prepareRender force their frame index dirty.
Add 2-second cooldown timer before re-checking for unloaded tiles
when workers are idle, preventing excessive streamTiles() calls
that caused frame hitches right after world load.
- Overlap M2 and character animation updates via std::async (~2-5ms saved)
- Thread-local collision scratch buffers for concurrent floor queries
- Parallel terrain/WMO/M2 floor queries in camera controller
- Seed new M2 instance bones from existing siblings to eliminate pop-in flash
- Fix shadow flicker: snap center along stable light axes instead of in view space
- Increase shadow distance default to 300 units (slider max 500)
- Normal map CPU work (luminance→blur→Sobel) moved to background threads,
main thread only does GPU upload (~1-2ms vs 15-22ms per texture)
- Load screen warmup now waits until ALL spawn/equipment/gameobject queues
are drained before transitioning (prevents naked character, NPC pop-in)
- Exit condition: min 2s + 5 consecutive empty iterations, hard cap 15s
- Equipment queue processes 8 items per warmup iteration instead of 1
- Added LoadingScreen::renderOverlay() for future world-behind-loading use
Each loadTexture call was generating a normal/height map inline (3 full-image
passes: luminance + blur + Sobel). For models with 15-20 textures this added
30-40ms to the 70ms model upload. Now deferred to a per-frame budget (2/frame
in-game, 10/frame during load screen). Models render without POM until their
normal maps are ready.
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.
Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.
Terrain streaming stalls: 1576ms → 124ms worst case.
- Replace per-frame VMA alloc/free of material UBOs with a ring buffer in
CharacterRenderer (~500 allocations/frame eliminated)
- Batch all ready terrain tiles into a single GPU upload during load screen
(processAllReadyTiles instead of one-at-a-time with individual fence waits)
- Lift per-frame creature/GO spawn budgets during load screen warmup phase
- Add background world preloader: saves last world position to disk, pre-warms
AssetManager file cache with ADT files starting at app init (login screen)
so terrain workers get instant cache hits when Enter World is clicked
- Distance-filter expensive collision guard to 8-unit melee range
- Merge 3 CharacterRenderer update loops into single pass
- Time-budget instrumentation for slow update stages (>3ms threshold)
- Count-based async creature model upload budget (max 3/frame in-game)
- 1-per-frame game object spawn + per-doodad time budget for transport loading
- Use deque for creature spawn queue to avoid O(n) front-erase
- Worker threads: use (cores - 1-2) instead of cores/2, minimum 4
- Outer upload batch in processReadyTiles: ALL model/texture uploads per
frame share a single command buffer submission + fence wait
- Upload multiple models per finalization step: 8 M2s, 4 WMOs, 16 doodads
per call instead of 1 each (all within same GPU batch)
- Terrain chunks: 64 per step instead of 16
- Skip redundant M2 file I/O: thread-safe uploadedM2Ids_ set lets
background workers skip re-reading+parsing models already on GPU
- processAllReadyTiles (loading screen) and processOneReadyTile also
wrapped in outer upload batches
Every uploadBuffer/VkTexture::upload called immediateSubmit which did a
separate vkQueueSubmit + vkWaitForFences. Loading a single creature model
with textures caused 4-8+ fence waits; terrain chunks caused 80+ per batch.
Added beginUploadBatch/endUploadBatch to VkContext: records all upload
commands into a single command buffer, submits once with one fence wait.
Staging buffers are deferred for cleanup after the batch completes.
Wrapped in batch mode:
- CharacterRenderer::loadModel (creature VB/IB + textures)
- M2Renderer::loadModel (doodad VB/IB + textures)
- TerrainRenderer::loadTerrain/loadTerrainIncremental (chunk geometry + textures)
- TerrainRenderer::uploadPreloadedTextures
- WMORenderer::loadModel (group geometry + textures)
Terrain finalization was uploading all 256 chunks (GPU fence waits) in one
atomic advanceFinalization call that couldn't be interrupted by the 5ms time
budget. Now split into incremental batches of 16 chunks per call, allowing
the time budget to yield between batches.
M2 instance creation had O(N) dedup scans iterating ALL instances to check
for duplicates. In cities with 5000+ doodads, this caused O(N²) total work
during tile loading. Replaced with hash-based DedupKey map for O(1) lookups.
Changes:
- TerrainRenderer::loadTerrainIncremental: uploads N chunks per call
- FinalizingTile tracks terrainChunkNext for cross-frame progress
- TERRAIN phase yields after preload and after each chunk batch
- M2Renderer::DedupKey hash map replaces linear scan in createInstance
and createInstanceWithMatrix
- Dedup map maintained through rebuildSpatialIndex and clear paths
- Add magma/slime rendering path to water shader (fbm noise, crust/molten/core coloring)
- Fix WMO liquid height filter rejecting high-altitude zones like Ironforge (Z>300)
- Allow interior WMO magma/slime MLIQ groups to load (skip only water/ocean)
- Mark LAVASTEAM.m2 as spell effect for proper additive blend, hide emission mesh
- Add isLavaModel flag for M2 ForgeLava/LavaPots UV scroll fallback
- Add isLava material detection in WMO renderer for lava texture UV animation
- Fix WMO material UBO colors for magma (was blue, now orange-red)
- Add case-insensitive "glass" detection for WMO window materials
- Make instance (WMO-only) glass highly transparent (12-35% alpha)
so underwater scenes are visible through Deeprun Tram windows
- Keep normal world windows at existing opacity (40-95% alpha)
- Disable shadow mapping for interior WMO groups to fix dark
indoor areas like Ironforge
- Add movementSuppressTimer to camera controller that forces all movement
keys to read as false, preventing held W key from carrying through
loading screens (fixes always-running-forward after instance portals)
- Increase shadow frustum default from 60 to 72 units (+20%)
- Make shadow distance configurable via setShadowDistance() (40-200 range)
- Add shadow distance slider in Video settings tab (persisted to config)
- Stronger wall collision push (0.35/0.15) and swept push (0.45/0.25)
for interior/exterior WMOs to reduce clipping through tunnel walls
- Use all triangles (not just pre-classified walls) for collision checks
- Allow invisible collidable triangles (MOPY 0x01 without 0x20) to block
- Pass insideWMO flag to all collision callers, match swim sweep to ground
- Widen swept hit detection radius from 0.15 to 0.25
- Restrict camera zoom to 12 units inside WMO interiors
- Fix /unstuck launching player above WMOs: remove +20 fallback, use
gravity when no floor found
- Slash and Enter keys always focus chat unless already typing
Fix VK_ERROR_DEVICE_LOST crash by allocating per-frame scene history
images (color + depth) instead of a single shared image that raced
between frames in flight. Water refraction can now be toggled via
Settings > Video > Water Refraction.
Without refraction: richer blue base colors, animated caustic shimmer,
and normal-based color shifts give the water visible life. With
refraction: clean screen-space refraction with Beer-Lambert absorption.
Disabling clears scene history to black for immediate fallback.
WMO interior doodads (gears, decorations) were blocking player movement
via M2 collision. Skip collision for all WMO doodad M2 instances since
the WMO itself handles wall collision.
Also filter WMO wall collision using MOPY per-triangle flags: only
rendered+collidable triangles block the player, skipping invisible
collision hulls.
Revert tram portal extended range (no longer needed with collision fix).