Root cause: bonesDirty was a single bool shared across both
double-buffered frame indices. When bones were copied to frame 0's
SSBO and bonesDirty cleared, frame 1's newly-allocated SSBO would
contain garbage/zeros and never get populated — causing animated
M2 instances to flash invisible on alternating frames.
Fix: Make bonesDirty per-frame-index (bool[2]) so each buffer
independently tracks whether it needs bone data uploaded. When
bones are recomputed, both indices are marked dirty. When uploaded
during render, only the current frame index is cleared. New buffer
allocations in prepareRender force their frame index dirty.
Add 2-second cooldown timer before re-checking for unloaded tiles
when workers are idle, preventing excessive streamTiles() calls
that caused frame hitches right after world load.
- Re-check for unloaded tiles when workers are idle (no tile boundary needed)
- Increase M2 upload budget 4→16 and WMO 1→4 per frame when not under pressure
- Lower tree collision threshold from 40 to 6 units so large trees block movement
- Overlap M2 and character animation updates via std::async (~2-5ms saved)
- Thread-local collision scratch buffers for concurrent floor queries
- Parallel terrain/WMO/M2 floor queries in camera controller
- Seed new M2 instance bones from existing siblings to eliminate pop-in flash
- Fix shadow flicker: snap center along stable light axes instead of in view space
- Increase shadow distance default to 300 units (slider max 500)
- Normal map CPU work (luminance→blur→Sobel) moved to background threads,
main thread only does GPU upload (~1-2ms vs 15-22ms per texture)
- Load screen warmup now waits until ALL spawn/equipment/gameobject queues
are drained before transitioning (prevents naked character, NPC pop-in)
- Exit condition: min 2s + 5 consecutive empty iterations, hard cap 15s
- Equipment queue processes 8 items per warmup iteration instead of 1
- Added LoadingScreen::renderOverlay() for future world-behind-loading use
Loading screen now calls processCreatureSpawnQueue(unlimited=true) which
removes the 1-upload-per-frame cap and 2ms time budget, allowing all pending
creature models to upload to GPU in bulk. Also increases concurrent async
background loads from 4 to 16 during load screen. Replaces 40-line inline
duplicate of processAsyncCreatureResults with the shared function.
Each loadTexture call was generating a normal/height map inline (3 full-image
passes: luminance + blur + Sobel). For models with 15-20 textures this added
30-40ms to the 70ms model upload. Now deferred to a per-frame budget (2/frame
in-game, 10/frame during load screen). Models render without POM until their
normal maps are ready.
Move all DBC lookups (CharSections, ItemDisplayInfo), texture path resolution,
and BLP decoding for humanoid NPCs to background threads. Only GPU texture
uploads remain on the main thread via pre-decoded BLP cache.
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.
Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.
Terrain streaming stalls: 1576ms → 124ms worst case.
- Replace per-frame VMA alloc/free of material UBOs with a ring buffer in
CharacterRenderer (~500 allocations/frame eliminated)
- Batch all ready terrain tiles into a single GPU upload during load screen
(processAllReadyTiles instead of one-at-a-time with individual fence waits)
- Lift per-frame creature/GO spawn budgets during load screen warmup phase
- Add background world preloader: saves last world position to disk, pre-warms
AssetManager file cache with ADT files starting at app init (login screen)
so terrain workers get instant cache hits when Enter World is clicked
- Distance-filter expensive collision guard to 8-unit melee range
- Merge 3 CharacterRenderer update loops into single pass
- Time-budget instrumentation for slow update stages (>3ms threshold)
- Count-based async creature model upload budget (max 3/frame in-game)
- 1-per-frame game object spawn + per-doodad time budget for transport loading
- Use deque for creature spawn queue to avoid O(n) front-erase
Was waiting for all ~50 tiles (radius 4) to fully prepare + finalize
before entering the game. Now loads only the immediate surrounding tiles
during the loading screen, then restores the full radius for in-game
streaming. setLoadRadius just sets an int — actual loading happens lazily
via background workers during the game loop.
- Worker threads: use (cores - 1-2) instead of cores/2, minimum 4
- Outer upload batch in processReadyTiles: ALL model/texture uploads per
frame share a single command buffer submission + fence wait
- Upload multiple models per finalization step: 8 M2s, 4 WMOs, 16 doodads
per call instead of 1 each (all within same GPU batch)
- Terrain chunks: 64 per step instead of 16
- Skip redundant M2 file I/O: thread-safe uploadedM2Ids_ set lets
background workers skip re-reading+parsing models already on GPU
- processAllReadyTiles (loading screen) and processOneReadyTile also
wrapped in outer upload batches
Every uploadBuffer/VkTexture::upload called immediateSubmit which did a
separate vkQueueSubmit + vkWaitForFences. Loading a single creature model
with textures caused 4-8+ fence waits; terrain chunks caused 80+ per batch.
Added beginUploadBatch/endUploadBatch to VkContext: records all upload
commands into a single command buffer, submits once with one fence wait.
Staging buffers are deferred for cleanup after the batch completes.
Wrapped in batch mode:
- CharacterRenderer::loadModel (creature VB/IB + textures)
- M2Renderer::loadModel (doodad VB/IB + textures)
- TerrainRenderer::loadTerrain/loadTerrainIncremental (chunk geometry + textures)
- TerrainRenderer::uploadPreloadedTextures
- WMORenderer::loadModel (group geometry + textures)
Terrain finalization was uploading all 256 chunks (GPU fence waits) in one
atomic advanceFinalization call that couldn't be interrupted by the 5ms time
budget. Now split into incremental batches of 16 chunks per call, allowing
the time budget to yield between batches.
M2 instance creation had O(N) dedup scans iterating ALL instances to check
for duplicates. In cities with 5000+ doodads, this caused O(N²) total work
during tile loading. Replaced with hash-based DedupKey map for O(1) lookups.
Changes:
- TerrainRenderer::loadTerrainIncremental: uploads N chunks per call
- FinalizingTile tracks terrainChunkNext for cross-frame progress
- TERRAIN phase yields after preload and after each chunk batch
- M2Renderer::DedupKey hash map replaces linear scan in createInstance
and createInstanceWithMatrix
- Dedup map maintained through rebuildSpatialIndex and clear paths
- Async creature model loading: M2 file I/O and parsing on background threads
via std::async, GPU upload on main thread when ready (MAX_ASYNC_CREATURE_LOADS=4)
- CharSections.dbc lookup cache: O(1) hash lookup instead of O(N) full DBC scan
per humanoid NPC spawn (was scanning thousands of records twice per spawn)
- Frame time budget: 4ms cap on creature spawn processing per frame
- Wolf/worg model name check cached per modelId (was doing tolower+find per
hostile creature per frame)
- Weapon attach throttle: max 2 per 1s tick (was attempting all unweaponized NPCs)
- Separate texture application tracking (displayIdTexturesApplied_) so async-loaded
models still get skin/equipment textures applied correctly
- Add water/lava/lighting rendering features to README
- Add transport riding to movement features
- Update status date and rendering capabilities
- Note interior shadow and lava steam known gaps
- Add magma/slime rendering path to water shader (fbm noise, crust/molten/core coloring)
- Fix WMO liquid height filter rejecting high-altitude zones like Ironforge (Z>300)
- Allow interior WMO magma/slime MLIQ groups to load (skip only water/ocean)
- Mark LAVASTEAM.m2 as spell effect for proper additive blend, hide emission mesh
- Add isLavaModel flag for M2 ForgeLava/LavaPots UV scroll fallback
- Add isLava material detection in WMO renderer for lava texture UV animation
- Fix WMO material UBO colors for magma (was blue, now orange-red)
- Add case-insensitive "glass" detection for WMO window materials
- Make instance (WMO-only) glass highly transparent (12-35% alpha)
so underwater scenes are visible through Deeprun Tram windows
- Keep normal world windows at existing opacity (40-95% alpha)
- Disable shadow mapping for interior WMO groups to fix dark
indoor areas like Ironforge
- Fix NULL renderer pointers by moving TransportManager connection after
initializeRenderers for WMO-only maps
- Fix tram direction by negating DBC TransportAnimation X/Y local offsets
before serverToCanonical conversion
- Implement client-side M2 transport boarding via proximity detection
(server doesn't send transport attachment for trams)
- Use position-delta approach: player keeps normal movement while
transport's frame-to-frame motion is applied on top
- Prevent server movement packets from clearing client-side M2 transport
state (isClientM2Transport guard)
- Fix getPlayerWorldPosition for M2 transports: simple canonical addition
instead of render-space matrix multiplication
- Add movementSuppressTimer to camera controller that forces all movement
keys to read as false, preventing held W key from carrying through
loading screens (fixes always-running-forward after instance portals)
- Increase shadow frustum default from 60 to 72 units (+20%)
- Make shadow distance configurable via setShadowDistance() (40-200 range)
- Add shadow distance slider in Video settings tab (persisted to config)
stepX/stepY were transposed: columns stepped south instead of east and
rows stepped east instead of south, mirroring all overworld water. Also
fix per-chunk origin to use layer.x for east offset and layer.y for south.
Remove frame throttling that skipped shadow updates in dense scenes,
causing visible flicker on player and NPCs. Reduce shadow half-extent
from 180 to 60 for 3x higher resolution on nearby shadows.
- Stronger wall collision push (0.35/0.15) and swept push (0.45/0.25)
for interior/exterior WMOs to reduce clipping through tunnel walls
- Use all triangles (not just pre-classified walls) for collision checks
- Allow invisible collidable triangles (MOPY 0x01 without 0x20) to block
- Pass insideWMO flag to all collision callers, match swim sweep to ground
- Widen swept hit detection radius from 0.15 to 0.25
- Restrict camera zoom to 12 units inside WMO interiors
- Fix /unstuck launching player above WMOs: remove +20 fallback, use
gravity when no floor found
- Slash and Enter keys always focus chat unless already typing
WMO origins can be far from their visible geometry, causing large city
buildings to be culled from the shadow pass. Use world bounding box for
instance culling and per-group AABB culling. Also increase WMO shadow
cull radius to match the shadow map coverage (180 units).
Fix VK_ERROR_DEVICE_LOST crash by allocating per-frame scene history
images (color + depth) instead of a single shared image that raced
between frames in flight. Water refraction can now be toggled via
Settings > Video > Water Refraction.
Without refraction: richer blue base colors, animated caustic shimmer,
and normal-based color shifts give the water visible life. With
refraction: clean screen-space refraction with Beer-Lambert absorption.
Disabling clears scene history to black for immediate fallback.
The glm::quat(w,x,y,z) constructor was receiving swapped X/Y components,
causing doodads like the Deeprun Tram gears to be oriented horizontally
instead of vertically. Also use createInstanceWithMatrix for instance WMO
doodads to preserve full rotation from the quaternion.
Classify light shafts, portals, spotlights, bubbles, and similar M2
doodads as spell effects so they render with additive blending instead
of as solid opaque objects.
Set camera yaw from server orientation on world load so teleports face
the correct direction.
Reduce area trigger minimum radius (3.0 sphere, 4.0 box) to prevent
premature portal firing near tram entrances.
WMO interior doodads (gears, decorations) were blocking player movement
via M2 collision. Skip collision for all WMO doodad M2 instances since
the WMO itself handles wall collision.
Also filter WMO wall collision using MOPY per-triangle flags: only
rendered+collidable triangles block the player, skipping invisible
collision hulls.
Revert tram portal extended range (no longer needed with collision fix).
Parse MOPY per-triangle flags in WMO groups and exclude detail/decorative
triangles (flag 0x04) from collision detection. This prevents invisible
walls from objects like gears and railings in WMO interiors.
Add WotLK area trigger IDs 2173/2175 to extended-range tram triggers.
- Equipment-driven geoset selection: read GeosetGroup1 from ItemDisplayInfo
for legs/feet/chest to pick covered mesh variants (1302+ pants, 402+ boots,
802+ sleeves) instead of always defaulting to bare geosets
- Prevent per-instance skin override from replacing baked/composited armor
textures on equipped NPCs
- Set bald NPC hair texture slot to skin texture so scalp isn't white
- Skip white fallback textures in per-instance hair overrides
- Remove debug texture dump, reduce NPC logging to DEBUG level
- Remove bogus 2-byte skip after materialId in MLIQ parser that shifted
all vertex heights and tile flags by 2 bytes (garbage data)
- Skip liquid loading for interior WMO groups (flag 0x2000) to prevent
indoor water from rendering as outdoor canal water
- Clear movement inputs on teleport/portal to prevent auto-running after
zone transfer (held keys persist through loading screen)
- Fix Stormwind barracks floor: interior WMO groups named "facade" were
incorrectly marked as LOD shells and hidden when close. Add !isIndoor
guard to all LOD detection conditions so interior groups always render.
- Fix water exit stair clipping: anchor lastGroundZ to current position
on swim exit, set grounded=true for full step-up budget, add upward
velocity boost to clear stair lip geometry.
- Re-enable NPC humanoid equipment geosets (kEnableNpcHumanoidOverrides)
so guards render with proper armor instead of underwear.
- Keep instance portal GameObjects animated (spinning/glowing) instead
of freezing all GO animations indiscriminately.
- Fix equipment disappearing after instance round-trip by resetting
dirty tracking on world reload.
- Fix multi-doodad-set loading: load both set 0 (global) and placement-
specific doodad set, with dedup to avoid double-loading.
- Clear placedWmoIds in softReset/unloadAll to prevent stale dedup.
- Apply MODF rotation to instance WMOs, snap player to WMO floor.
- Re-enable rebuildSpatialIndex in setInstanceTransform.
- Store precomputeFloorCache results in precomputed grid.
- Add F8 debug key for WMO floor diagnostics at player position.
- Expand mapIdToName with all Classic/TBC/WotLK instance map IDs.
Low-vertex groups (<100 verts) were incorrectly marked as distance-only
LOD shells in small WMOs like Stockades. Now only applies this heuristic
to large WMOs (50+ groups) where it's needed for city exterior shells.
- NPC hair/skin textures now use per-instance overrides instead of shared
model-level textures, so each NPC shows its own hair color/style
- Hair/skin DBC lookup runs for every NPC instance (including cached models)
rather than only on first load
- Fix keyframe binary search to use float comparison matching original
linear scan semantics
- Replace O(n) linear keyframe search with O(log n) binary search in both
M2 and Character renderers (runs thousands of times per frame)
- Smoke particle removal: swap-and-pop instead of O(n²) vector erase
- Character render backface cull: eliminate sqrt via squared comparison
- Quaternion validation: use length² instead of sqrt-based length check
Use cached model flags (isValid, isSmoke, isInvisibleTrap, isGroundDetail,
disableAnimation, boundRadius) on M2Instance instead of models.find() in
the hot culling paths. Also complete cached flag initialization in
createInstanceWithMatrix().
- Glow sprites now use dedicated vertex buffer (glowVB_) separate from
M2 particle buffer to prevent data race when renderM2Particles()
overwrites glow data mid-flight
- Move fadeAlpha from shared material UBO to per-draw push constants,
eliminating cross-instance alpha race on non-double-buffered UBOs
- Smooth adaptive render distance transitions to prevent pop-in/out
at instance count thresholds (1000/2000)
- Distance-tiered character bone throttling: near (<30u) every frame,
mid (30-60u) every 3rd, far (60-120u) every 6th frame
- Skip weapon instance animation updates (transforms set by parent bones)
- Split M2 instances into fast-path index lists (animated, particle-only,
particle-all, smoke) to avoid iterating all 46K instances per frame
- Cache model flags (hasAnimation, disableAnimation, isSmoke, etc.) on
M2Instance struct to eliminate per-frame hash lookups
- Replace full rebuildSpatialIndex on position/transform updates with
incremental grid cell remove+add, preventing 8.5ms/frame rebuild cost
- Advance animTime for all instances (texture UV animation) but only
compute bones and particles for the ~3K that need it
M2_UPDATE: 10.7ms → 2.0ms, FPS: 35 → 55-59