AMD RADV crash is renderPhase=beginFrame with faultAddr=(nil) — a NULL
pointer dereference somewhere in the pre-pass chain. Add granular markers
(bf:ubo, bf:minimap, bf:worldmap, bf:preview, bf:shadow, bf:reflection,
bf:renderpass) to pinpoint the exact call.
Add signal-safe render-phase markers throughout GameScreen::render() and
Application::render() so the crash handler can report which render call was
active when a SIGSEGV occurs. The AMD RADV crash backtrace only shows 2
frames due to missing frame pointers, making it impossible to identify the
actual crash site.
Changes:
- Add volatile g_crashRenderPhase marker updated before each major render call
- Upgrade Linux signal handler to sigaction with SA_SIGINFO for faulting address
- Set ImGui CheckVkResultFn to log silent Vulkan errors in ImGui backend
- Enable -fno-omit-frame-pointer in all build configs (not just Debug/RelWithDebInfo)
destroyInstanceBones loops over both frame slots (i=0,1), deferring
both boneSet[0] and boneSet[1] to the current frame's fence. When
currentFrame=0, boneSet[1] is freed after only slot 0's fence completes
while slot 1's command buffer may still be using it.
Switch both M2Renderer and CharacterRenderer bone destruction from
deferAfterFrameFence to deferAfterAllFrameFences to ensure all
in-flight frames have completed before freeing cross-slot resources.
deferAfterFrameFence only waits for one frame slot's fence, but shared
resources (material descriptor sets, vertex/index buffers) are bound by
both in-flight frames' command buffers. On AMD RADV this caused
vkFreeDescriptorSets errors and eventual SIGSEGV.
Add deferAfterAllFrameFences: queues to every frame slot with a shared
counter so cleanup runs exactly once, after the last slot is fenced.
Use it for WMO, terrain, water, and character model shared resources.
Per-frame bone sets keep using deferAfterFrameFence (already correct).
Also fix character renderer vertex format: R8G8B8A8_UINT -> _SINT to
match shader's ivec4 input (RADV validation rejects the mismatch).
The WotLK spline parser tries 6 format variants and accepts the first
that passes minimal validation (pointCount<=256, splineMode<=3). A wrong
format can pass by coincidence, consuming incorrect bytes and corrupting
all subsequent UPDATE_OBJECT blocks (e.g. maskBlockCount=219 garbage).
Add endPoint coordinate validation: reject spline parses where the
endpoint is non-finite or outside world bounds (65k). Also harden the
Turtle parser to keep successfully-parsed blocks on mid-packet failure
instead of discarding the entire packet.
CharacterRenderer::destroyModelGPU now defers vertex/index buffer
destruction when replacing models mid-stream, preventing use-after-free
on AMD RADV. FXAA descriptor sets are now per-frame to eliminate
write-read races between in-flight command buffers. Water reflection
descriptor update narrowed to current frame only.
CharacterRenderer::destroyInstanceBones had the same immediate-free bug
as M2Renderer — freeing bone descriptor sets and buffers while in-flight
command buffers still reference them. Applies the same deferred pattern
via deferAfterFrameFence for the removeInstance streaming path.
auctionSellItem now resolves the item GUID internally via
backpackSlotGuids_ with resolveOnlineItemGuid fallback, matching the
pattern used by vendor sell and item use. Previously the UI passed
the GUID directly from getBackpackItemGuid() with no fallback, so
items with unset slot GUIDs silently failed to list.
Also gates CMSG_AUCTION_SELL_ITEM format by expansion: Classic/TBC
omits the itemCount and stackCount fields that WotLK requires.
M2 destroyInstanceBones and WMO destroyGroupGPU freed descriptor sets
and buffers immediately during tile streaming, while in-flight command
buffers still referenced them — causing DEVICE_LOST on AMD RADV.
Now defers GPU resource destruction via deferAfterFrameFence in streaming
paths (removeInstance, removeInstances, unloadModel). Immediate
destruction preserved for shutdown/clear paths that vkDeviceWaitIdle
first.
Also: vkDeviceWaitIdle before WMO backfillNormalMaps descriptor rebinds,
and fillModeNonSolid added to required device features for wireframe
pipelines on AMD.
ListInventoryParser::parse() was resetting the entire ListInventoryData
struct, wiping the canRepair flag set by the gossip handler before the
server response arrived. Preserve it across the parse.
Also detect repair capability from UNIT_NPC_FLAG_REPAIR (0x1000) on the
vendor NPC entity, so direct vendors without gossip menus also show the
repair button.
The pool was exhausted by cached spell/item/talent icon textures,
causing vkAllocateDescriptorSets to fail inside ImGui_ImplVulkan_AddTexture.
The NVIDIA driver crashed on the subsequent invalid descriptor write.
Also add a null-check on the returned descriptor set so pool exhaustion
gracefully returns VK_NULL_HANDLE instead of crashing.
During shutdown, VkContext::runDeferredCleanup() was executing lambdas
that called vkFreeDescriptorSets on descriptor pools already destroyed
by Renderer::shutdown(). This corrupted the validation layer's internal
state, causing a SIGSEGV during process exit on AMD RADV.
Clear the deferred queues without executing them — vkDestroyDevice
reclaims all device-child resources anyway. Also guard against the
double shutdown() call (explicit + destructor).
Avoid semaphore reuse while the presentation engine still holds a
reference by switching from per-frame-slot to per-swapchain-image
semaphores with a rotating free semaphore for acquire.
Replace the R8G8B8A8_UNORM dummy white texture in CharacterPreview
with a proper D16_UNORM depth texture cleared to 1.0, matching the
sampler2DShadow expectation in shaders. AMD RADV enforces strict
format/sampler type compatibility.
Three bugs found via AMD RADV crash log:
1. Water reflection render pass used BOTTOM_OF_PIPE as srcStageMask but
pipelines were created against the main pass (EARLY_FRAGMENT_TESTS |
COLOR_ATTACHMENT_OUTPUT). AMD enforces strict render pass compatibility
→ SIGSEGV when scene renders into reflection texture.
2. samplerAnisotropy was never enabled during device creation despite being
used in sampler creation — now requested via PhysicalDeviceSelector.
3. Shadow texture descriptor pool was reset each frame while prior frame's
command buffers might still reference it. Split into per-frame-slot pools
so each reset is fence-guarded.
AMD RDNA4 (9070XT) crashes with SIGSEGV when MSAA is enabled because the
driver optimizes TRANSIENT images for tile-only storage. Without lazily
allocated memory backing, the MSAA resolve reads unbacked memory. Now we
only set TRANSIENT+LAZILY_ALLOCATED when the device actually exposes that
memory type.
M2 particle/ribbon/batch, terrain layer, and WMO material texture
resolution paths were silently falling back to white textures when
indices were out of range — making missing texture issues hard to
diagnose. Add LOG_WARNING at each silent failure point with model
name, index details, and array sizes.
After bundling dylibs, verify with otool -L that every non-system
dylib referenced by wowee_bin is present in the app bundle. Fails
the build if any are missing — prevents silent repeat of #36/#41.
Added to both build.yml and release.yml.
extern/catch2 was covered by the extern/* gitignore pattern without
an exception, causing CI to fail with a missing source file error.
Added !extern/catch2 exception and committed the amalgamated files.
DRY up renderAuraRemaining, fmtDurationCompact, classColorVec4,
classColorU32, entityClassId, classNameStr, kDispelNames, and
kRaidMarkNames — duplicated across game_screen, social_panel,
and combat_ui after the panel extraction refactors.
The Lua refactor branch was based before the cleanup commit and
brought back allMacroCommands, getMacroShowtooltipArg (game_screen),
lfgJoinResultString, lfgTeleportDeniedString (game_handler).
The ubuntu-24.04-arm runner is memory-constrained and the full
parallel Release build was being killed by the OOM reaper, causing
the Build step to fail silently with no log output. Cap at 2 jobs.
685 lines of unused code duplicated into extracted handler files
(entity_controller, spell_handler, quest_handler, warden_handler,
social_handler, action_bar_panel, chat_panel, window_manager)
during PRs #33-#38. Build is now warning-free.
The warmup loop waited up to 20 seconds for getHeightAt() to return a
terrain height within 15 units of spawn Z before accepting the ground
as ready. In practice, the terrain was loaded and the character was
visibly standing on it, but the height sample didn't match closely
enough (terrain LOD, chunk boundary, or server Z vs client height
mismatch).
Reduce the tile-count fallback timeout from 20s to 5s: if at least 4
tiles are loaded after 5 seconds, accept the ground as ready. The
exact height check still runs in the first 5 seconds for fast-path
cases where it does match.