Implement GPU-driven Hierarchical-Z occlusion culling for M2 doodads
using a depth pyramid built from the previous frame's depth buffer.
The cull shader projects bounding spheres via prevViewProj (temporal
reprojection) and samples the HiZ pyramid to reject hidden objects
before the main render pass.
Key implementation details:
- Separate early compute submission (beginSingleTimeCommands + fence
wait) eliminates 2-frame visibility staleness
- Conservative safeguards prevent false culls: screen-edge guard,
full VP row-vector AABB projection (Cauchy-Schwarz), 50% sphere
inflation, depth bias, mip+1, min screen size threshold, camera
motion dampening (auto-disable on fast rotations), and per-instance
previouslyVisible flag tracking
- Graceful fallback to frustum-only culling if HiZ init fails
Fix dark WMO interiors by gating shadow map sampling on isInterior==0
in the WMO fragment shader. Interior groups (flag 0x2000) now rely
solely on pre-baked MOCV vertex-color lighting + MOHD ambient color.
Disable interiorDarken globally (was incorrectly darkening outdoor M2s
when camera was inside a WMO). Use isInsideInteriorWMO() instead of
isInsideWMO() for correct indoor detection.
New files:
- hiz_system.hpp/cpp: pyramid image management, compute pipeline,
descriptors, mip-chain build dispatch, resize handling
- hiz_build.comp.glsl: MAX-depth 2x2 reduction compute shader
- m2_cull_hiz.comp.glsl: frustum + HiZ occlusion cull compute shader
- test_indoor_shadows.cpp: 14 unit tests for shadow/interior contracts
Modified:
- CullUniformsGPU expanded 128->272 bytes (HiZ params, viewProj,
prevViewProj)
- Depth buffer images gain VK_IMAGE_USAGE_SAMPLED_BIT for HiZ reads
- wmo.frag.glsl: interior branch before unlit, shadow skip for 0x2000
- Render graph: hiz_build + compute_cull disabled (run in early compute)
- .gitignore: ignore compiled .spv binaries
- MEGA_BONE_MAX_INSTANCES: 2048 -> 4096
Signed-off-by: Pavel Okhlopkov <pavel.okhlopkov@flant.com>
Add game_interfaces.hpp with five narrow domain contracts that GameHandler now
publishes to its domain handlers, replacing the previous friend-class anti-pattern.
Changes:
- include/game/game_interfaces.hpp (new): IConnectionState, ITargetingState,
IEntityAccess, ISocialState, IPvpState — each interface exposes only the state
its consumer legitimately needs
- include/game/game_handler.hpp: GameHandler inherits all five interfaces;
include of game_interfaces.hpp added
- include/game/movement_handler.hpp: remove `friend class GameHandler`; add
public named accessors for previously-private fields (monsterMovePacketsThisTickRef,
timeSinceLastMoveHeartbeatRef, resetMovementClock, setFalling, setFallStartMs)
- include/game/spell_handler.hpp: remove `friend class GameHandler/InventoryHandler/
CombatHandler/EntityController`; promote private packet handlers (handlePetSpells,
handleListStabledPets, pet stable commands, DBC loaders) to public; add accessor
methods for aura cache, known spells, and player aura slot mutation
- src/game/game_handler.cpp, game_handler_callbacks.cpp, game_handler_packets.cpp:
replace direct private field access with the new accessor API
(e.g. casting_ → isCasting(), monsterMovePacketsThisTick_ → ...ThisTickRef())
- src/game/inventory_handler.cpp, combat_handler.cpp, entity_controller.cpp:
replace friend-class private access with public accessor calls
No behaviour change. All 13 test suites pass. Zero build warnings.
Auto-reply was sent on every incoming whisper with no dedup, causing
infinite loops when both players had auto-reply enabled. Now tracks
which senders have been replied to and only sends one auto-reply per
sender per AFK/DND session.
The decomposition PRs moved mail state to InventoryHandler but the GO
interaction code still set stale GameHandler fields. Add openMailbox()
on InventoryHandler and forward from GameHandler so the correct
mailboxGuid_/mailboxOpen_ are set and refreshMailList() works.
Calculate repair costs client-side using DurabilityCosts.dbc and
DurabilityQuality.dbc. Block repair when player can't afford it and
only apply optimistic durability/gold updates when cost is verified.
Show repair cost next to the Repair All button in the vendor window.
MAX_CULL_INSTANCES was 16384 but game object instances were allocated
at indices 20000+, beyond the cull buffer. Increased to 65536.
Also compute fallback boundRadius from vertex extents when M2 header
reports 0, and seed bones for global-sequence-only animated models.
- VkTexture::isValid() now checks both image AND sampler handles. Previously
it only checked the image, so a texture with a valid image but NULL sampler
would pass validation and get bound to a descriptor set. On MoltenVK (macOS)
this renders as pink/magenta boxes; the fallback white texture is now
correctly used instead.
- Fix fs::path to std::string implicit conversion in asset extractor that
broke the Windows (MSYS2/clang) CI build.
M2 GPU instancing
- M2InstanceGPU SSBO (96 B/entry, double-buffered, 16384 max)
- Group opaque instances by (modelId, LOD); single vkCmdDrawIndexed per group
- boneBase field indexes into mega bone SSBO via gl_InstanceIndex
Indirect terrain drawing
- 24 MB mega index buffer (6M uint32) + 64 MB mega vertex buffer
- CPU builds VkDrawIndexedIndirectCommand per visible chunk
- Single VB/IB bind per frame; shadow pass reuses mega buffers
- Replaced vkCmdDrawIndexedIndirect with direct vkCmdDrawIndexed to fix
host-mapped buffer race condition that caused terrain flickering
GPU frustum culling (compute shader)
- m2_cull.comp.glsl: 64-thread workgroups, sphere-vs-6-planes + distance cull
- CullInstanceGPU SSBO input, uint visibility[] output, double-buffered
- dispatchCullCompute() runs before main pass via render graph node
Consolidated bone matrix SSBOs
- 16 MB double-buffered mega bone SSBO (2048 instances × 128 bones)
- Eliminated per-instance descriptor sets; one megaBoneSet_ per frame
- prepareRender() packs bone matrices consecutively into current frame slot
Render graph / frame graph
- RenderGraph: RGResource handles, RGPass nodes, Kahn topological sort
- Automatic VkImageMemoryBarrier/VkBufferMemoryBarrier between passes
- Passes: minimap_composite, worldmap_composite, preview_composite,
shadow_pass, reflection_pass, compute_cull
- beginFrame() uses buildFrameGraph() + renderGraph_->execute(cmd)
Pipeline derivatives
- PipelineBuilder::setFlags/setBasePipeline for VK_PIPELINE_CREATE_DERIVATIVE_BIT
- M2 opaque = base; alphaTest/alpha/additive are derivatives
- Applied to terrain (wireframe) and WMO (alpha-test) renderers
Rendering bug fixes:
- fix(shadow): compute lightSpaceMatrix before updatePerFrameUBO to eliminate
one-frame lag that caused shadow trails and flicker on moving objects
- fix(shadow): scale depth bias with shadowDistance_ instead of hardcoded 0.8f
to prevent acne at close range and gaps at far range
- fix(visibility): WMO group distance threshold 500u → 1200u to match terrain
view distance; buildings were disappearing on the horizon
- fix(precision): camera near plane 0.05 → 0.5 (ratio 600K:1 → 60K:1),
eliminating Z-fighting and improving frustum plane extraction stability
- fix(streaming): terrain load radius 4 → 6 tiles (~2133u → ~3200u) to exceed
M2 render distance (2800u) and eliminate pop-in when camera turns;
unload radius 7 → 9; spawn radius 3 → 4
- fix(visibility): ground-detail M2 distance multiplier 0.75 → 0.9 to reduce
early pop of grass and debris
Whisper sender name may not be in the player name cache when the packet
arrives. Store the sender GUID and lazily resolve the name from the
cache in getLastWhisperSender(). Also backfill lastWhisperSender_ when
the SMSG_NAME_QUERY_RESPONSE arrives.
Global sequence bones (hair, cape, physics) need time values spanning
their full duration (up to ~968733ms), but animationTime wraps at the
current animation's sequence duration (~2000ms for walk). This caused
vertex spikes projecting from fingers/neck/ponytail as bones got stuck
in the first ~2s of their loop. Add a separate globalSequenceTime
accumulator that is not wrapped at the animation duration.
buildDefaultPlayerGeosets() was inserting all submeshIds 0-99 into
activeGeosets, showing every hair variation simultaneously. Now uses
the hairGeosetMap_ (from CharHairGeosets.dbc) to select only the
correct hair scalp geoset for the player's race/sex/style, matching
the existing NPC geoset filtering logic in EntitySpawner.
Hair, cape, and other physics bones use global sequences (continuously
looping timers independent of the character's current animation). The
character renderer was ignoring globalSequence entirely, causing these
bones to fall back to identity transforms and produce deformed/spiked
hair geometry. Added resolveTrackTime() to wrap global sequence time
correctly, matching the M2 renderer's existing behavior.
Revert static JSON layout changes (15-22 back to 14-21) since WotLK
loads the Classic 23-field DBC. Add getItemDisplayInfoTextureFields()
helper that detects field count at runtime and adjusts the texture
base index accordingly (14 for 23-field, 15 for 25-field).
Instance was created with Vulkan 1.1 but depthResolveSupported_ was gated
on the physical device's API version (1.2+ on RADV). This caused
vkCreateRenderPass2 (core 1.2) to dispatch through a null function pointer
when MSAA was enabled. Now requests 1.2 instance with 1.1 minimum fallback
and gates depth resolve on the actual instance API version. Also removes
all diagnostic crash-phase instrumentation from the previous investigation.
deferAfterFrameFence only waits for one frame slot's fence, but shared
resources (material descriptor sets, vertex/index buffers) are bound by
both in-flight frames' command buffers. On AMD RADV this caused
vkFreeDescriptorSets errors and eventual SIGSEGV.
Add deferAfterAllFrameFences: queues to every frame slot with a shared
counter so cleanup runs exactly once, after the last slot is fenced.
Use it for WMO, terrain, water, and character model shared resources.
Per-frame bone sets keep using deferAfterFrameFence (already correct).
Also fix character renderer vertex format: R8G8B8A8_UINT -> _SINT to
match shader's ivec4 input (RADV validation rejects the mismatch).
CharacterRenderer::destroyModelGPU now defers vertex/index buffer
destruction when replacing models mid-stream, preventing use-after-free
on AMD RADV. FXAA descriptor sets are now per-frame to eliminate
write-read races between in-flight command buffers. Water reflection
descriptor update narrowed to current frame only.
CharacterRenderer::destroyInstanceBones had the same immediate-free bug
as M2Renderer — freeing bone descriptor sets and buffers while in-flight
command buffers still reference them. Applies the same deferred pattern
via deferAfterFrameFence for the removeInstance streaming path.
auctionSellItem now resolves the item GUID internally via
backpackSlotGuids_ with resolveOnlineItemGuid fallback, matching the
pattern used by vendor sell and item use. Previously the UI passed
the GUID directly from getBackpackItemGuid() with no fallback, so
items with unset slot GUIDs silently failed to list.
Also gates CMSG_AUCTION_SELL_ITEM format by expansion: Classic/TBC
omits the itemCount and stackCount fields that WotLK requires.
M2 destroyInstanceBones and WMO destroyGroupGPU freed descriptor sets
and buffers immediately during tile streaming, while in-flight command
buffers still referenced them — causing DEVICE_LOST on AMD RADV.
Now defers GPU resource destruction via deferAfterFrameFence in streaming
paths (removeInstance, removeInstances, unloadModel). Immediate
destruction preserved for shutdown/clear paths that vkDeviceWaitIdle
first.
Also: vkDeviceWaitIdle before WMO backfillNormalMaps descriptor rebinds,
and fillModeNonSolid added to required device features for wireframe
pipelines on AMD.
Avoid semaphore reuse while the presentation engine still holds a
reference by switching from per-frame-slot to per-swapchain-image
semaphores with a rotating free semaphore for acquire.
Replace the R8G8B8A8_UNORM dummy white texture in CharacterPreview
with a proper D16_UNORM depth texture cleared to 1.0, matching the
sampler2DShadow expectation in shaders. AMD RADV enforces strict
format/sampler type compatibility.
Three bugs found via AMD RADV crash log:
1. Water reflection render pass used BOTTOM_OF_PIPE as srcStageMask but
pipelines were created against the main pass (EARLY_FRAGMENT_TESTS |
COLOR_ATTACHMENT_OUTPUT). AMD enforces strict render pass compatibility
→ SIGSEGV when scene renders into reflection texture.
2. samplerAnisotropy was never enabled during device creation despite being
used in sampler creation — now requested via PhysicalDeviceSelector.
3. Shadow texture descriptor pool was reset each frame while prior frame's
command buffers might still reference it. Split into per-frame-slot pools
so each reset is fence-guarded.
DRY up renderAuraRemaining, fmtDurationCompact, classColorVec4,
classColorU32, entityClassId, classNameStr, kDispelNames, and
kRaidMarkNames — duplicated across game_screen, social_panel,
and combat_ui after the panel extraction refactors.
- moved chat panel logic out of `game_screen` into `chat_panel`
- added chat_panel.hpp and chat_panel.cpp
- updated game_screen.hpp and game_screen.cpp to integrate new `ChatPanel` component
- updated build config in CMakeLists.txt to include new UI module sources
Replace the incorrectly extracted RSA-2048 modulus (which contained
the exponent bytes embedded inside it) with the verified Blizzard
public key used across all pre-Cataclysm clients (1.12.1, 2.4.3,
3.3.5a).
Key confirmed against two independent sources:
- namreeb/WardenSigning ClientKey.hpp (72 verified sniffed modules)
- SkullSecurity wiki Warden_Modules documentation
The modulus starts with 0x6BCE F52D... and ends with ...03F4 AFC7.
Exponent remains 65537 (0x010001).
Verification algorithm: SHA1(module_data + "MAIEV.MOD"), 0xBB-padded
to 256 bytes, RSA verify-recover with raw (no-padding) mode.
Signature failures are non-fatal (log warning, continue loading) so
private-server modules signed with custom keys still work. This is
necessary because servers like ChromieCraft/AzerothCore may use their
own signing keys.
Also update warden_module.hpp status: all implementation items now ✅.
Complete the last major Warden stub — the import table parser that
resolves Windows API calls in loaded modules. This is the critical
missing piece for strict servers like Warmane.
Implementation:
- Parse Warden module import table from decompressed data (after
relocation entries): alternating libraryName\0 / functionName\0
pairs, terminated by null library name
- For each import, look up the emulator's pre-registered stub address
(VirtualAlloc, GetTickCount, ReadProcessMemory, etc.)
- Auto-stub unrecognized APIs with a no-op returning 0 — prevents
module crashes on unimplemented Windows functions
- Patch each IAT slot (sequential dwords at module image base) with
the resolved stub address
- Add WardenEmulator::getAPIAddress() public accessor for IAT lookups
- Fix initialization order: bindAPIs() now runs inside initializeModule()
after emulator setup but before entry point call
The full Warden pipeline is now: RC4 decrypt → RSA verify → zlib
decompress → parse executable → relocate → create emulator → register
API hooks → bind imports (IAT patch) → call entry point → extract
exported functions (packetHandler, tick, generateRC4Keys, unload).