M2 GPU instancing
- M2InstanceGPU SSBO (96 B/entry, double-buffered, 16384 max)
- Group opaque instances by (modelId, LOD); single vkCmdDrawIndexed per group
- boneBase field indexes into mega bone SSBO via gl_InstanceIndex
Indirect terrain drawing
- 24 MB mega index buffer (6M uint32) + 64 MB mega vertex buffer
- CPU builds VkDrawIndexedIndirectCommand per visible chunk
- Single VB/IB bind per frame; shadow pass reuses mega buffers
- Replaced vkCmdDrawIndexedIndirect with direct vkCmdDrawIndexed to fix
host-mapped buffer race condition that caused terrain flickering
GPU frustum culling (compute shader)
- m2_cull.comp.glsl: 64-thread workgroups, sphere-vs-6-planes + distance cull
- CullInstanceGPU SSBO input, uint visibility[] output, double-buffered
- dispatchCullCompute() runs before main pass via render graph node
Consolidated bone matrix SSBOs
- 16 MB double-buffered mega bone SSBO (2048 instances × 128 bones)
- Eliminated per-instance descriptor sets; one megaBoneSet_ per frame
- prepareRender() packs bone matrices consecutively into current frame slot
Render graph / frame graph
- RenderGraph: RGResource handles, RGPass nodes, Kahn topological sort
- Automatic VkImageMemoryBarrier/VkBufferMemoryBarrier between passes
- Passes: minimap_composite, worldmap_composite, preview_composite,
shadow_pass, reflection_pass, compute_cull
- beginFrame() uses buildFrameGraph() + renderGraph_->execute(cmd)
Pipeline derivatives
- PipelineBuilder::setFlags/setBasePipeline for VK_PIPELINE_CREATE_DERIVATIVE_BIT
- M2 opaque = base; alphaTest/alpha/additive are derivatives
- Applied to terrain (wireframe) and WMO (alpha-test) renderers
Rendering bug fixes:
- fix(shadow): compute lightSpaceMatrix before updatePerFrameUBO to eliminate
one-frame lag that caused shadow trails and flicker on moving objects
- fix(shadow): scale depth bias with shadowDistance_ instead of hardcoded 0.8f
to prevent acne at close range and gaps at far range
- fix(visibility): WMO group distance threshold 500u → 1200u to match terrain
view distance; buildings were disappearing on the horizon
- fix(precision): camera near plane 0.05 → 0.5 (ratio 600K:1 → 60K:1),
eliminating Z-fighting and improving frustum plane extraction stability
- fix(streaming): terrain load radius 4 → 6 tiles (~2133u → ~3200u) to exceed
M2 render distance (2800u) and eliminate pop-in when camera turns;
unload radius 7 → 9; spawn radius 3 → 4
- fix(visibility): ground-detail M2 distance multiplier 0.75 → 0.9 to reduce
early pop of grass and debris
Terrain pool 16384→65536, WMO pool 8192→32768. The previous sizes
were too small for the load/unload radii, causing pool exhaustion
and a hard crash when streaming terrain on large maps.
Add initializeShadow() to TerrainRenderer that creates a depth-only
shadow pipeline reusing the existing shadow.vert/frag shaders (same
path as WMO/M2/character renderers). renderShadow() draws all terrain
chunks with sphere culling against the shadow coverage radius. Wire
both init and draw calls into Renderer so terrain now casts shadows
alongside buildings and NPCs.
- Worker threads: use (cores - 1-2) instead of cores/2, minimum 4
- Outer upload batch in processReadyTiles: ALL model/texture uploads per
frame share a single command buffer submission + fence wait
- Upload multiple models per finalization step: 8 M2s, 4 WMOs, 16 doodads
per call instead of 1 each (all within same GPU batch)
- Terrain chunks: 64 per step instead of 16
- Skip redundant M2 file I/O: thread-safe uploadedM2Ids_ set lets
background workers skip re-reading+parsing models already on GPU
- processAllReadyTiles (loading screen) and processOneReadyTile also
wrapped in outer upload batches
Terrain finalization was uploading all 256 chunks (GPU fence waits) in one
atomic advanceFinalization call that couldn't be interrupted by the 5ms time
budget. Now split into incremental batches of 16 chunks per call, allowing
the time budget to yield between batches.
M2 instance creation had O(N) dedup scans iterating ALL instances to check
for duplicates. In cities with 5000+ doodads, this caused O(N²) total work
during tile loading. Replaced with hash-based DedupKey map for O(1) lookups.
Changes:
- TerrainRenderer::loadTerrainIncremental: uploads N chunks per call
- FinalizingTile tracks terrainChunkNext for cross-frame progress
- TERRAIN phase yields after preload and after each chunk batch
- M2Renderer::DedupKey hash map replaces linear scan in createInstance
and createInstanceWithMatrix
- Dedup map maintained through rebuildSpatialIndex and clear paths
Both passes were rendering the entire loaded scene (17×17 tile radius)
into a shadow map that only covers 360×360 world units — submitting
10-50× more geometry than the shadow frustum can actually use.
- TerrainRenderer::renderShadow: skip chunks whose bounding sphere
doesn't overlap the shadow frustum AABB in XY. Reduces terrain draw
calls from O(all loaded chunks) to O(chunks within ~180 units).
- M2Renderer::renderShadow: skip instances whose world AABB doesn't
overlap the shadow frustum in XY. Reduces M2 draw calls similarly.
- Both functions now take shadowCenter + halfExtent parameters.
Load BLP texture data during prepareTile() and upload to GL cache in
finalizeTile(), eliminating file I/O stalls on the main thread. Reduce
ready tiles per frame to 1. Fix camera sweep to snap Z to ramp surfaces.
Change hearthstone action bar slot from spell to item.