When unloadTile() was called for a tile still in finalizingTiles_
(mid-incremental-finalization), terrain chunks already uploaded to the
GPU (terrainMeshDone=true) were not being cleaned up. The early-return
path correctly removed water and M2/WMO instances but missed calling
terrainRenderer->removeTile(), causing descriptor sets to leak.
After ~20 minutes of play the VkDescriptorPool (MAX_MATERIAL_SETS=16384)
filled up, causing all subsequent terrain material allocations to fail
and the log to flood with "failed to allocate material descriptor set".
Fix: check fit->terrainMeshDone before the early return and call
terrainRenderer->removeTile() to free those descriptor sets.
Hang/GPU device lost fix:
- M2_INSTANCES and WMO_INSTANCES finalization phases now create instances
incrementally (32 per step / 4 per step) instead of all at once, eliminating
the >1s main-thread stalls that caused GPU fence timeouts and device loss
M2 two-pass transparent rendering:
- Opaque/alpha-test batches render in pass 1, transparent/additive in pass 2
(back-to-front sorted) to fix wing transparency showing terrain instead of
trees — adds hasTransparentBatches flag to skip models with no transparency
Tile streaming improvements:
- Sort new load queue entries nearest-first so critical tiles load before
distant ones during fast taxi flight
- Increase taxi load radius 6→8 tiles, unload 9→12 for better coverage
Water refraction gated on FSR:
- Disable water refraction when FSR is not active (bugged without upscaling)
- Auto-disable refraction if FSR is turned off while refraction was on
- Reduce max finalization steps per frame: 2→1 (normal), 8→4 (taxi)
- Reduce terrain chunk upload batch: 32→16 chunks per step
- Reduce idle M2 model upload budget: 16→6 per step
- Reduce idle WMO model upload budget: 4→2 per step
Tiles still stream in quickly but spread GPU upload work across
more frames, eliminating the frame spikes right after spawning.
Add 2-second cooldown timer before re-checking for unloaded tiles
when workers are idle, preventing excessive streamTiles() calls
that caused frame hitches right after world load.
- Re-check for unloaded tiles when workers are idle (no tile boundary needed)
- Increase M2 upload budget 4→16 and WMO 1→4 per frame when not under pressure
- Lower tree collision threshold from 40 to 6 units so large trees block movement
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.
Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.
Terrain streaming stalls: 1576ms → 124ms worst case.
- Replace per-frame VMA alloc/free of material UBOs with a ring buffer in
CharacterRenderer (~500 allocations/frame eliminated)
- Batch all ready terrain tiles into a single GPU upload during load screen
(processAllReadyTiles instead of one-at-a-time with individual fence waits)
- Lift per-frame creature/GO spawn budgets during load screen warmup phase
- Add background world preloader: saves last world position to disk, pre-warms
AssetManager file cache with ADT files starting at app init (login screen)
so terrain workers get instant cache hits when Enter World is clicked
- Distance-filter expensive collision guard to 8-unit melee range
- Merge 3 CharacterRenderer update loops into single pass
- Time-budget instrumentation for slow update stages (>3ms threshold)
- Count-based async creature model upload budget (max 3/frame in-game)
- 1-per-frame game object spawn + per-doodad time budget for transport loading
- Use deque for creature spawn queue to avoid O(n) front-erase
- Worker threads: use (cores - 1-2) instead of cores/2, minimum 4
- Outer upload batch in processReadyTiles: ALL model/texture uploads per
frame share a single command buffer submission + fence wait
- Upload multiple models per finalization step: 8 M2s, 4 WMOs, 16 doodads
per call instead of 1 each (all within same GPU batch)
- Terrain chunks: 64 per step instead of 16
- Skip redundant M2 file I/O: thread-safe uploadedM2Ids_ set lets
background workers skip re-reading+parsing models already on GPU
- processAllReadyTiles (loading screen) and processOneReadyTile also
wrapped in outer upload batches
Terrain finalization was uploading all 256 chunks (GPU fence waits) in one
atomic advanceFinalization call that couldn't be interrupted by the 5ms time
budget. Now split into incremental batches of 16 chunks per call, allowing
the time budget to yield between batches.
M2 instance creation had O(N) dedup scans iterating ALL instances to check
for duplicates. In cities with 5000+ doodads, this caused O(N²) total work
during tile loading. Replaced with hash-based DedupKey map for O(1) lookups.
Changes:
- TerrainRenderer::loadTerrainIncremental: uploads N chunks per call
- FinalizingTile tracks terrainChunkNext for cross-frame progress
- TERRAIN phase yields after preload and after each chunk batch
- M2Renderer::DedupKey hash map replaces linear scan in createInstance
and createInstanceWithMatrix
- Dedup map maintained through rebuildSpatialIndex and clear paths
- Add magma/slime rendering path to water shader (fbm noise, crust/molten/core coloring)
- Fix WMO liquid height filter rejecting high-altitude zones like Ironforge (Z>300)
- Allow interior WMO magma/slime MLIQ groups to load (skip only water/ocean)
- Mark LAVASTEAM.m2 as spell effect for proper additive blend, hide emission mesh
- Add isLavaModel flag for M2 ForgeLava/LavaPots UV scroll fallback
- Add isLava material detection in WMO renderer for lava texture UV animation
- Fix WMO material UBO colors for magma (was blue, now orange-red)
The glm::quat(w,x,y,z) constructor was receiving swapped X/Y components,
causing doodads like the Deeprun Tram gears to be oriented horizontally
instead of vertically. Also use createInstanceWithMatrix for instance WMO
doodads to preserve full rotation from the quaternion.
WMO interior doodads (gears, decorations) were blocking player movement
via M2 collision. Skip collision for all WMO doodad M2 instances since
the WMO itself handles wall collision.
Also filter WMO wall collision using MOPY per-triangle flags: only
rendered+collidable triangles block the player, skipping invisible
collision hulls.
Revert tram portal extended range (no longer needed with collision fix).
- Remove bogus 2-byte skip after materialId in MLIQ parser that shifted
all vertex heights and tile flags by 2 bytes (garbage data)
- Skip liquid loading for interior WMO groups (flag 0x2000) to prevent
indoor water from rendering as outdoor canal water
- Clear movement inputs on teleport/portal to prevent auto-running after
zone transfer (held keys persist through loading screen)
- Fix Stormwind barracks floor: interior WMO groups named "facade" were
incorrectly marked as LOD shells and hidden when close. Add !isIndoor
guard to all LOD detection conditions so interior groups always render.
- Fix water exit stair clipping: anchor lastGroundZ to current position
on swim exit, set grounded=true for full step-up budget, add upward
velocity boost to clear stair lip geometry.
- Re-enable NPC humanoid equipment geosets (kEnableNpcHumanoidOverrides)
so guards render with proper armor instead of underwear.
- Keep instance portal GameObjects animated (spinning/glowing) instead
of freezing all GO animations indiscriminately.
- Fix equipment disappearing after instance round-trip by resetting
dirty tracking on world reload.
- Fix multi-doodad-set loading: load both set 0 (global) and placement-
specific doodad set, with dedup to avoid double-loading.
- Clear placedWmoIds in softReset/unloadAll to prevent stale dedup.
- Apply MODF rotation to instance WMOs, snap player to WMO floor.
- Re-enable rebuildSpatialIndex in setInstanceTransform.
- Store precomputeFloorCache results in precomputed grid.
- Add F8 debug key for WMO floor diagnostics at player position.
- Expand mapIdToName with all Classic/TBC/WotLK instance map IDs.
The previous commit changed std::move to copy for terrain/mesh data to fix
the empty-cache bug. But copying ~8 MB per tile × 81 tiles caused a 60s
streaming timeout.
The tile cache was already broken before — putCachedTile stored a shared_ptr
to the same PendingTile whose data was moved out, so cached tiles always had
empty meshes. Remove the putCachedTile call entirely; tiles re-parse from
ADT files (asset manager file cache hit) when they re-enter streaming range.
The softReset cache clear from the previous commit remains as safety for
map transitions.
Three fixes:
1. Water captureSceneHistory gated on hasSurfaces() — the image layout
transitions (PRESENT_SRC→TRANSFER_SRC→PRESENT_SRC) were running every
frame even on WMO-only maps with no water, causing VK_ERROR_DEVICE_LOST.
2. Tile cache invalidation: softReset() now clears tileCache_ since cache
keys are (x,y) without map name — prevents stale cross-map cache hits.
3. Copy terrain/mesh into TerrainTile instead of std::move — the moved-from
PendingTile was cached with empty data, so subsequent map loads returned
tiles with 0 valid chunks from cache.
Also adds diagnostic skip env vars (WOWEE_SKIP_TERRAIN, WOWEE_SKIP_SKY,
WOWEE_SKIP_PREPASSES) and a 0-chunk warning in loadTerrain.
- Fix shutdown hang: skip vmaDestroyAllocator (walked thousands of allocations),
replace unsafe pthread_timedjoin_np with plain join + early-exit checks in workers
- Bank window: full icon rendering, click-and-hold pickup (0.10s), drag-drop for
all bank slots including bank bag equip slots, same-slot drop detection
- Loading screen: process one tile per frame for live progress updates
- Camera reset: trust server position in online mode to avoid spawning under WMOs
- Fix PLAYER_BYTES/PLAYER_BYTES_2 field indices, preserve purchasedBankBagSlots
across inventory rebuilds, fix bank slot purchase result codes
unloadAll() now uses a 500ms deadline with pthread_timedjoin_np to
avoid blocking indefinitely when worker threads are mid-prepareTile
(reading MPQ archives / parsing ADT files). Threads that don't finish
within the deadline are detached so the app can exit promptly.
unloadAll() joins worker threads which blocks if they're mid-tile
(prepareTile can take seconds for heavy ADTs). Replace with softReset()
which clears tile data, queues, and water surfaces without stopping
worker threads — workers find empty queues and idle naturally.
- Add VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT to water material
descriptor pool so individual sets can be freed when tiles are unloaded
- Free descriptor sets in destroyWaterMesh() instead of leaking them
- Add terrain manager unloadAll() during logout to properly clear stale
tiles, water surfaces, and queues between sessions
- Add diagnostic logging for water surface loading, material allocation
failures, and render skip reasons to investigate missing water
- Windows: SetThreadAffinityMask to pin main thread to core 0 and
exclude workers from core 0
- macOS: thread_policy_set with THREAD_AFFINITY_POLICY tags to hint
scheduler separation (tag 1 for main, tag 2 for workers)
Water deduplication: merge per-chunk water surfaces into per-tile surfaces
to reduce Vulkan descriptor set usage from ~8900 to ~100-200. Uses hybrid
approach — groups with ≤4 chunks stay per-chunk (preserving shore detail),
larger groups merge into 128×128 tile-wide surfaces.
Re-add incremental tile finalization state machine (reverted in 9b90ab0)
to spread GPU uploads across frames and prevent city stuttering.
Pin main thread to CPU core 0 and exclude worker threads from core 0
to reduce scheduling jitter on the render/game loop.
The incremental advanceFinalization state machine broke water rendering
in ways that couldn't be resolved. Reverted to the original monolithic
finalizeTile approach. The other performance optimizations (bone SSBO
pre-allocation, WMO distance culling, M2 adaptive distance tiers)
are kept.
Phase-splitting across frames caused water surfaces to not render
correctly. Changed processReadyTiles to run all phases for each tile
before moving to the next, with time budget checked between tiles.
Replace monolithic finalizeTile() with a phased state machine that spreads
GPU upload work across multiple frames (TERRAIN→M2→WMO→WATER→AMBIENT→DONE).
Each advanceFinalization() call does one bounded unit of work within the
per-frame time budget, eliminating 50-300ms frame hitches when entering cities.
Additional performance improvements:
- Pre-allocate bone SSBOs at M2 instance creation instead of lazily during
first render frame, preventing hitches when many skinned characters appear
- Enable WMO distance culling (800 units) with active-group exemption so
the player's current floor/neighbors are never culled
- Add 4-tier adaptive M2 render distance (250/400/600/1000 based on count)
- Remove dead PendingM2Upload queue code superseded by incremental system
Fix tile re-enqueueing bug: keep tiles in pendingTiles until committed to
loadedTiles (not when moved to finalizingTiles_) so streamTiles() doesn't
re-enqueue tiles mid-finalization. Also handle unloadTile() for tiles in
the finalizingTiles_ deque to prevent orphaned water/M2/WMO resources.
- Fix StormLib package name: libstormlib-dev → libstorm-dev (correct
Ubuntu package name) across all CI workflows and extract_assets.sh
- Build StormLib from source on Windows CI (no MSYS2 package exists),
ensuring asset_extract.exe is included in release archives
- Update extract_assets.sh/.ps1 to prefer pre-built asset_extract
binary next to the script (release archives) before trying build dir
- Move ADTTerrain allocations from stack to heap in prepareTile() to
fix stack overflow on macOS (worker threads default to 512 KB stack,
two ADTTerrain structs ≈ 560 KB exceeded that)
- reduce per-tile ground clutter generation pressure and enforce tighter caps to avoid spikes
- remove expensive detail dedupe scans from the hot render path
- add progressive/lazy clutter updates around player movement to smooth frame pacing
- lower noisy runtime INFO logging to DEBUG/throttled paths
- keep terrain/game screen updates responsive while preserving existing behavior
Vanilla (v256) and TBC (v263) M2 files embed skin data directly, parsed
during M2Loader::load(). The code unconditionally loaded external .skin
files afterwards, which resolved to WotLK-format .skin files (48-byte
submeshes) from the base manifest — overwriting the correctly parsed
embedded skin (32-byte submeshes) and causing mesh corruption on all
character models. Guard all 13 loadSkin() call sites with version >= 264
so external .skin files are only loaded for WotLK M2s that need them.
Mount Animation System:
- Property-based jump animation discovery using sequence metadata
- Chain linkage scoring (nextAnimation/aliasNext) for accurate detection
- Correct loop detection: flags & 0x01 == 0 means looping
- Avoids brake/stop animations via blendTime penalties
- Works on any mount model without hardcoded animation IDs
Mount Physics:
- Physics-based jump height: vz = sqrt(2 * g * h)
- Configurable MOUNT_JUMP_HEIGHT constant (1.0m default)
- Procedural lean into turns for ground mounts
- Smooth roll based on turn rate (±14° max, 6x/sec blend)
Audio Improvements:
- State-machine driven mount sounds (jump, land, rear-up)
- Semantic sound methods (no animation ID dependencies)
- Debug logging for missing sound files
Bug Fixes:
- Fixed mount animation sequencing (JumpStart → JumpLoop → JumpEnd)
- Fixed animation loop flag interpretation (0x20 vs 0x21)
- Rider bone attachment working correctly during all mount actions
Added deduplication for WMO instances based on uniqueId, matching the
existing M2 doodad deduplication logic. This prevents creating multiple
instances of the same WMO when it's referenced from multiple ADT tiles.
Before: STORMWIND.WMO (uniqueId=10047) was being rendered 16 times
(one instance per ADT tile that references it)
After: Only 1 instance is created and shared across all tiles
Changes:
- Added placedWmoIds set to TerrainManager (like placedDoodadIds)
- Check uniqueId before creating WMO instance
- Skip duplicate WMO placements across tile boundaries
- Log dedup statistics: 'X instances, Y dedup skipped'
This should fix the floating cathedral visual issue if it was caused by
rendering artifacts from 16x overdraw, and will massively improve
performance in Stormwind.
- Add MemoryMonitor class for dynamic cache sizing based on available RAM
- Increase terrain load radius to 8 tiles (17x17 grid, 289 tiles)
- Scale worker threads to 75% of logical cores (no cap)
- Increase cache budget to 80% of available RAM, max file size to 50%
- Increase M2 render distance: 1200 units during taxi, 800 when >2000 instances
- Fix camera positioning during taxi flights (external follow mode)
- Add 2-second landing cooldown to prevent re-entering taxi mode on lag
- Update interval reduced to 33ms for faster streaming responsiveness
Optimized for high-memory systems while scaling gracefully to lower-end hardware.
Cache and render distances now fully utilize available VRAM on minimum spec GPUs.
Modern GPUs have 8-16GB VRAM - leverage this to cache all M2 models permanently.
Changes:
- Disabled cleanupUnusedModels() call when tiles unload
- Models now stay in VRAM after initial load, even when tiles unload
- Increased taxi mounting delay from 3s to 5s for more precache time
- Added logging: M2 model count, instance count, and GPU upload duration
- Added debug logging when M2 models are uploaded per tile
This fixes the "building pops up then pause" issue - models were being:
1. Loaded when tile loads
2. Unloaded when tile unloads (behind taxi)
3. Re-loaded when flying through again (causing hitch)
Now models persist in VRAM permanently (few hundred MB for typical session).
First pass loads to VRAM, subsequent passes are instant.
Major improvements:
- Load TaxiPathNode.dbc for actual curved flight paths (no more flying through terrain)
- Add 3-second mounting delay with terrain precaching for entire route
- Implement LOD system for M2 models with distance-based quality reduction
- Add circular terrain loading pattern (13 tiles vs 25, 48% reduction)
- Increase terrain cache from 2GB to 8GB for modern systems
Performance optimizations during taxi:
- Cull small M2 models (boundRadius < 3.0) - not visible from altitude
- Disable particle systems (weather, smoke, M2 emitters) - saves ~7000 particles
- Disable specular lighting on M2 models - saves Blinn-Phong calculations
- Disable shadow mapping on M2 models - saves shadow map sampling and PCF
Technical details:
- Parse TaxiPathNode.dbc spline waypoints for curved paths around terrain
- Build full path from node pairs using TaxiPathEdge lookup
- Precache callback triggers during mounting delay for smooth takeoff
- Circular tile loading uses Euclidean distance check (dx²+dy² <= r²)
- LOD fallback to base mesh when higher LODs unavailable
Result: Buttery smooth taxi flights with no terrain clipping or performance hitches
Load BLP texture data during prepareTile() and upload to GL cache in
finalizeTile(), eliminating file I/O stalls on the main thread. Reduce
ready tiles per frame to 1. Fix camera sweep to snap Z to ramp surfaces.
Change hearthstone action bar slot from spell to item.
Guard pendingTiles.erase() with queueMutex in processReadyTiles and
unloadTile to prevent data race with worker threads. Add defensive null
checks in M2/WMO render and animation paths. Move cleanupUnusedModels
out of per-tile unload loop to run once after all tiles are removed.
Enrich online inventory from local DB when server data is incomplete, add
resolveOnlineItemGuid fallback for sell/equip/use, use async enqueueTile for
initial terrain load, improve walk/run animation fallbacks, clear target on
loot close, and broaden equipability detection to include armor/subclass.
Use getRemainingTileCount (pending + readyQueue) and processAllReadyTiles
to prevent loading screen from exiting before tiles are finalized. Auto-select
realm and character when only one is available.