M2 GPU instancing
- M2InstanceGPU SSBO (96 B/entry, double-buffered, 16384 max)
- Group opaque instances by (modelId, LOD); single vkCmdDrawIndexed per group
- boneBase field indexes into mega bone SSBO via gl_InstanceIndex
Indirect terrain drawing
- 24 MB mega index buffer (6M uint32) + 64 MB mega vertex buffer
- CPU builds VkDrawIndexedIndirectCommand per visible chunk
- Single VB/IB bind per frame; shadow pass reuses mega buffers
- Replaced vkCmdDrawIndexedIndirect with direct vkCmdDrawIndexed to fix
host-mapped buffer race condition that caused terrain flickering
GPU frustum culling (compute shader)
- m2_cull.comp.glsl: 64-thread workgroups, sphere-vs-6-planes + distance cull
- CullInstanceGPU SSBO input, uint visibility[] output, double-buffered
- dispatchCullCompute() runs before main pass via render graph node
Consolidated bone matrix SSBOs
- 16 MB double-buffered mega bone SSBO (2048 instances × 128 bones)
- Eliminated per-instance descriptor sets; one megaBoneSet_ per frame
- prepareRender() packs bone matrices consecutively into current frame slot
Render graph / frame graph
- RenderGraph: RGResource handles, RGPass nodes, Kahn topological sort
- Automatic VkImageMemoryBarrier/VkBufferMemoryBarrier between passes
- Passes: minimap_composite, worldmap_composite, preview_composite,
shadow_pass, reflection_pass, compute_cull
- beginFrame() uses buildFrameGraph() + renderGraph_->execute(cmd)
Pipeline derivatives
- PipelineBuilder::setFlags/setBasePipeline for VK_PIPELINE_CREATE_DERIVATIVE_BIT
- M2 opaque = base; alphaTest/alpha/additive are derivatives
- Applied to terrain (wireframe) and WMO (alpha-test) renderers
Rendering bug fixes:
- fix(shadow): compute lightSpaceMatrix before updatePerFrameUBO to eliminate
one-frame lag that caused shadow trails and flicker on moving objects
- fix(shadow): scale depth bias with shadowDistance_ instead of hardcoded 0.8f
to prevent acne at close range and gaps at far range
- fix(visibility): WMO group distance threshold 500u → 1200u to match terrain
view distance; buildings were disappearing on the horizon
- fix(precision): camera near plane 0.05 → 0.5 (ratio 600K:1 → 60K:1),
eliminating Z-fighting and improving frustum plane extraction stability
- fix(streaming): terrain load radius 4 → 6 tiles (~2133u → ~3200u) to exceed
M2 render distance (2800u) and eliminate pop-in when camera turns;
unload radius 7 → 9; spawn radius 3 → 4
- fix(visibility): ground-detail M2 distance multiplier 0.75 → 0.9 to reduce
early pop of grass and debris
deferAfterFrameFence only waits for one frame slot's fence, but shared
resources (material descriptor sets, vertex/index buffers) are bound by
both in-flight frames' command buffers. On AMD RADV this caused
vkFreeDescriptorSets errors and eventual SIGSEGV.
Add deferAfterAllFrameFences: queues to every frame slot with a shared
counter so cleanup runs exactly once, after the last slot is fenced.
Use it for WMO, terrain, water, and character model shared resources.
Per-frame bone sets keep using deferAfterFrameFence (already correct).
Also fix character renderer vertex format: R8G8B8A8_UINT -> _SINT to
match shader's ivec4 input (RADV validation rejects the mismatch).
M2 particle/ribbon/batch, terrain layer, and WMO material texture
resolution paths were silently falling back to white textures when
indices were out of range — making missing texture issues hard to
diagnose. Add LOG_WARNING at each silent failure point with model
name, index details, and array sizes.
- renderer: remove no-op assignment (mountAnims_.stand = 0 when already 0)
- renderer: add why-comments on blacksmith WMO ID 96048 (ambient forge
sounds) with TODO for other smithy buildings
- terrain_renderer: replace 1e30f sentinel with numeric_limits::max(),
name terrain view distance constant (1200 units ≈ 9 ADT tiles)
- social_handler: add missing LFG case 15, document case 0 nullptr
return (success = no error message), add enum name comments
vmaCreateBuffer return value was silently discarded in both loadTerrain
and loadTerrainIncremental. If allocation failed (OOM/fragmentation),
the chunk proceeded with a VK_NULL_HANDLE UBO, causing the GPU to read
from an invalid descriptor on the next draw call. Now checks the return
value and skips the chunk on failure.
Create a VkPipelineCache at device init, loaded from disk if available.
All 65 pipeline creation calls across 19 renderer files now use the
shared cache. On shutdown, the cache is serialized to disk so subsequent
launches skip redundant shader compilation.
Cache path: ~/.local/share/wowee/pipeline_cache.bin (Linux),
~/Library/Caches/wowee/ (macOS), %APPDATA%\wowee\ (Windows).
Stale/corrupt caches are handled gracefully (fallback to empty cache).
Move ShadowParamsUBO from 5 separate shadow rendering functions (2 in
m2_renderer, 1 in terrain_renderer, 1 in wmo_renderer) into shared
vk_frame_data.hpp header. Eliminates 5 identical local struct definitions
and improves consistency across all shadow pass implementations. Structure
layout matches shader std140 uniform buffer requirements.
Move envSizeMBOrDefault and envSizeOrDefault from 4 separate rendering
modules (character_renderer, m2_renderer, terrain_renderer, wmo_renderer)
into shared vk_utils.hpp header as inline functions. Use the most robust
version which includes overflow checking for MB-to-bytes conversion. This
eliminates 7 identical local function definitions and improves consistency
across all rendering modules.
Move ShadowPush from 4 separate rendering modules (character_renderer,
m2_renderer, terrain_renderer, wmo_renderer) into shared vk_frame_data.hpp
header. This eliminates 4 identical local struct definitions and ensures
consistency across all shadow rendering passes. Add vk_frame_data.hpp include
to character_renderer.cpp.
- terrain_renderer: add FREE_DESCRIPTOR_SET_BIT flag and vkFreeDescriptorSets
in destroyChunkGPU so material descriptor sets are returned to the pool;
prevents GPU device lost from pool exhaustion near populated areas
- game_screen: fix projectToMinimap to use the exact inverse of the minimap
shader transform so quest objective markers appear at the correct position
and orientation regardless of camera bearing
- inventory_screen: fix item comparison tooltip to not compare equipped items
against themselves (character screen); add item level diff line; show (=)
indicator when stats are equal rather than bare value which looked identical
to the item's own tooltip
- Parse bundled sub-packets from SMSG_MULTIPLE_PACKETS using the WotLK
standard wire format (uint16_be size + uint16_le opcode + payload),
dispatching each through handlePacket() instead of silently discarding.
Rate-limited warning for malformed sub-packet overruns.
- Remove unused cullRadiusSq variable in TerrainRenderer::renderShadow()
that produced a -Wunused-variable warning.
Add initializeShadow() to TerrainRenderer that creates a depth-only
shadow pipeline reusing the existing shadow.vert/frag shaders (same
path as WMO/M2/character renderers). renderShadow() draws all terrain
chunks with sphere culling against the shadow coverage radius. Wire
both init and draw calls into Renderer so terrain now casts shadows
alongside buildings and NPCs.
Every uploadBuffer/VkTexture::upload called immediateSubmit which did a
separate vkQueueSubmit + vkWaitForFences. Loading a single creature model
with textures caused 4-8+ fence waits; terrain chunks caused 80+ per batch.
Added beginUploadBatch/endUploadBatch to VkContext: records all upload
commands into a single command buffer, submits once with one fence wait.
Staging buffers are deferred for cleanup after the batch completes.
Wrapped in batch mode:
- CharacterRenderer::loadModel (creature VB/IB + textures)
- M2Renderer::loadModel (doodad VB/IB + textures)
- TerrainRenderer::loadTerrain/loadTerrainIncremental (chunk geometry + textures)
- TerrainRenderer::uploadPreloadedTextures
- WMORenderer::loadModel (group geometry + textures)
Terrain finalization was uploading all 256 chunks (GPU fence waits) in one
atomic advanceFinalization call that couldn't be interrupted by the 5ms time
budget. Now split into incremental batches of 16 chunks per call, allowing
the time budget to yield between batches.
M2 instance creation had O(N) dedup scans iterating ALL instances to check
for duplicates. In cities with 5000+ doodads, this caused O(N²) total work
during tile loading. Replaced with hash-based DedupKey map for O(1) lookups.
Changes:
- TerrainRenderer::loadTerrainIncremental: uploads N chunks per call
- FinalizingTile tracks terrainChunkNext for cross-frame progress
- TERRAIN phase yields after preload and after each chunk batch
- M2Renderer::DedupKey hash map replaces linear scan in createInstance
and createInstanceWithMatrix
- Dedup map maintained through rebuildSpatialIndex and clear paths
Three fixes:
1. Water captureSceneHistory gated on hasSurfaces() — the image layout
transitions (PRESENT_SRC→TRANSFER_SRC→PRESENT_SRC) were running every
frame even on WMO-only maps with no water, causing VK_ERROR_DEVICE_LOST.
2. Tile cache invalidation: softReset() now clears tileCache_ since cache
keys are (x,y) without map name — prevents stale cross-map cache hits.
3. Copy terrain/mesh into TerrainTile instead of std::move — the moved-from
PendingTile was cached with empty data, so subsequent map loads returned
tiles with 0 valid chunks from cache.
Also adds diagnostic skip env vars (WOWEE_SKIP_TERRAIN, WOWEE_SKIP_SKY,
WOWEE_SKIP_PREPASSES) and a 0-chunk warning in loadTerrain.
Remove negative cache for transient load failures in M2, terrain, and
character renderers. Failed textures return white and will be retried
on next model/tile load when assets may have finished streaming.
Raise all texture cache defaults from 1GB to 4GB to reduce rejections.
Cap cache-full warnings (texture + model) to 3 messages per renderer,
and cap update block parse errors to 5 messages.
Strip 26 chrono::now() timing calls per frame from renderer update loop,
periodic LOG_INFO/LOG_DEBUG from terrain/character/quest/heartbeat paths,
and dead m2ProfileCounter variable.
- ensure player transform sync is not gated by third-person so player shadow stays consistent
- hold shadow projection center during movement and snap once on stop to remove delayed catch-up
- smooth foliage caster transitions using blended phase-shifted UV samples
- tune foliage motion frequencies/amplitudes for less popping
- increase shadow map resolution to 2048 and retune terrain PCF texel scale
- increase global shadow strength from 0.62 to 0.68 for stronger, clearer shadows
- keep shadow projection center fixed while moving to remove per-frame projection churn flicker
- replace delayed post-move catch-up with immediate stop transition and idle smoothing
- rework foliage shadow caster motion to use blended phase-shifted UV samples for continuous position transitions
- reduce high-frequency foliage threshold popping by removing threshold warping path
- sharpen terrain receive filtering with tuned 5-tap PCF weights/offset for more detailed shadows
- raise shadow map resolution to 1536 and keep light-space texel snapping for stable sampling
- set shadows enabled by default and lower global shadow strength from 0.65 to 0.62
- keep foliage animation speed consistent between moving and idle at 80%
Both passes were rendering the entire loaded scene (17×17 tile radius)
into a shadow map that only covers 360×360 world units — submitting
10-50× more geometry than the shadow frustum can actually use.
- TerrainRenderer::renderShadow: skip chunks whose bounding sphere
doesn't overlap the shadow frustum AABB in XY. Reduces terrain draw
calls from O(all loaded chunks) to O(chunks within ~180 units).
- M2Renderer::renderShadow: skip instances whose world AABB doesn't
overlap the shadow frustum in XY. Reduces M2 draw calls similarly.
- Both functions now take shadowCenter + halfExtent parameters.
Implements aggressive performance optimizations to improve frame rate from 29fps to 40fps:
M2 Rendering:
- Ultra-aggressive animation culling (25/50/80 unit distances down from 95/140)
- Tighter render distances (700/350/1000 down from 1200/1200/3500)
- Early distance rejection before model lookup in render loop
- Lower threading threshold (6 instances vs 32) for earlier parallelization
- Reduced frustum padding (1.5x vs 2.5x) for tighter culling
- Better memory reservation based on expected visible count
Terrain Rendering:
- Early distance culling at 1200 units before frustum checks
- Skips ~11,500 distant chunks per frame (12,500 total chunks loaded)
- Saves 5-6ms on render pass
Performance Impact:
- Render time: 20ms → 14-15ms (30% faster)
- Frame rate: 29fps → 40fps (+11fps)
- Total savings: ~9ms per frame
Load BLP texture data during prepareTile() and upload to GL cache in
finalizeTile(), eliminating file I/O stalls on the main thread. Reduce
ready tiles per frame to 1. Fix camera sweep to snap Z to ramp surfaces.
Change hearthstone action bar slot from spell to item.
Drop specular lighting from terrain shader — ground materials (dirt, grass,
stone) are purely diffuse and specular highlights made them look plastic.
Replace Reinhard tonemapper with a shoulder-only compressor that is identity
below 0.9 and softly rolls off HDR values above, preserving scene contrast.
Replace hardcoded specular multipliers with uLightColor * uSpecularIntensity
uniforms in all 4 world shaders (terrain, WMO, M2, character), set HDR sun
color (1.5, 1.4, 1.3) and specular intensity 0.5 so highlights can exceed
1.0, and switch the post-process pass from passthrough to exposure-compensated
Reinhard (exposure=1.8) for soft highlight roll-off without clipping.
Anisotropic filtering now queries GL_MAX_TEXTURE_MAX_ANISOTROPY_EXT once
and applies via a single applyAnisotropicFiltering() utility, replacing
hardcoded calls across all renderers. Fog (sky horizon color, 100-600
range) and Blinn-Phong specular highlights are added to WMO, M2, and
character shaders for visual parity with terrain. Shadow sampling
plumbing (sampler2DShadow with 3x3 PCF) is wired into all three shaders
gated by uShadowEnabled, ready for a future shadow map pass.
- Implement GPU bone skinning for M2 doodads/creatures (gryphons, birds)
- Store bone hierarchy and animation keyframes per model
- Compute bone matrices per-instance with keyframe interpolation
- Upload bone weights/indices in vertex buffer, skinning in vertex shader
- Fix terrain texture rendering: restore sampler-to-unit uniform bindings
removed during texture bind optimization (roads were invisible)
- Add spellbook screen (P key) with Spell.dbc name lookup and action bar assignment
- Default Attack and Hearthstone spells available in single player
- Fix WMO floor clipping (gryphon roost) by tightening ceiling rejection threshold
- Darken ocean water, increase wave motion and opacity
- Add M2 model distance fade-in to prevent pop-in
- Reposition chat window, add slash/enter key focus
- Remove debug key commands (keep only F1 perf HUD, N minimap)
- Performance: return chat history by const ref, use deque for O(1) pop_front