Wait on in-flight upload batch fences at the start of each frame and
insert a memory barrier (transfer→fragment shader) so the graphics
queue sees completed layout transitions from the transfer queue.
Fixes VK_IMAGE_LAYOUT_UNDEFINED validation errors for freshly loaded
textures.
AMD RADV validation flagged missing shaderStorageImageWriteWithoutFormat,
shaderInt16, shaderFloat16, and deviceCoherentMemory. The first two are
now required device features; shaderFloat16 is optionally enabled via
Vulkan 1.2 feature query; AMD device coherent memory extension and
feature are enabled when available to prevent VMA memory type errors.
Instance was created with Vulkan 1.1 but depthResolveSupported_ was gated
on the physical device's API version (1.2+ on RADV). This caused
vkCreateRenderPass2 (core 1.2) to dispatch through a null function pointer
when MSAA was enabled. Now requests 1.2 instance with 1.1 minimum fallback
and gates depth resolve on the actual instance API version. Also removes
all diagnostic crash-phase instrumentation from the previous investigation.
deferAfterFrameFence only waits for one frame slot's fence, but shared
resources (material descriptor sets, vertex/index buffers) are bound by
both in-flight frames' command buffers. On AMD RADV this caused
vkFreeDescriptorSets errors and eventual SIGSEGV.
Add deferAfterAllFrameFences: queues to every frame slot with a shared
counter so cleanup runs exactly once, after the last slot is fenced.
Use it for WMO, terrain, water, and character model shared resources.
Per-frame bone sets keep using deferAfterFrameFence (already correct).
Also fix character renderer vertex format: R8G8B8A8_UINT -> _SINT to
match shader's ivec4 input (RADV validation rejects the mismatch).
M2 destroyInstanceBones and WMO destroyGroupGPU freed descriptor sets
and buffers immediately during tile streaming, while in-flight command
buffers still referenced them — causing DEVICE_LOST on AMD RADV.
Now defers GPU resource destruction via deferAfterFrameFence in streaming
paths (removeInstance, removeInstances, unloadModel). Immediate
destruction preserved for shutdown/clear paths that vkDeviceWaitIdle
first.
Also: vkDeviceWaitIdle before WMO backfillNormalMaps descriptor rebinds,
and fillModeNonSolid added to required device features for wireframe
pipelines on AMD.
The pool was exhausted by cached spell/item/talent icon textures,
causing vkAllocateDescriptorSets to fail inside ImGui_ImplVulkan_AddTexture.
The NVIDIA driver crashed on the subsequent invalid descriptor write.
Also add a null-check on the returned descriptor set so pool exhaustion
gracefully returns VK_NULL_HANDLE instead of crashing.
During shutdown, VkContext::runDeferredCleanup() was executing lambdas
that called vkFreeDescriptorSets on descriptor pools already destroyed
by Renderer::shutdown(). This corrupted the validation layer's internal
state, causing a SIGSEGV during process exit on AMD RADV.
Clear the deferred queues without executing them — vkDestroyDevice
reclaims all device-child resources anyway. Also guard against the
double shutdown() call (explicit + destructor).
Avoid semaphore reuse while the presentation engine still holds a
reference by switching from per-frame-slot to per-swapchain-image
semaphores with a rotating free semaphore for acquire.
Replace the R8G8B8A8_UNORM dummy white texture in CharacterPreview
with a proper D16_UNORM depth texture cleared to 1.0, matching the
sampler2DShadow expectation in shaders. AMD RADV enforces strict
format/sampler type compatibility.
Three bugs found via AMD RADV crash log:
1. Water reflection render pass used BOTTOM_OF_PIPE as srcStageMask but
pipelines were created against the main pass (EARLY_FRAGMENT_TESTS |
COLOR_ATTACHMENT_OUTPUT). AMD enforces strict render pass compatibility
→ SIGSEGV when scene renders into reflection texture.
2. samplerAnisotropy was never enabled during device creation despite being
used in sampler creation — now requested via PhysicalDeviceSelector.
3. Shadow texture descriptor pool was reset each frame while prior frame's
command buffers might still reference it. Split into per-frame-slot pools
so each reset is fence-guarded.
AMD RDNA4 (9070XT) crashes with SIGSEGV when MSAA is enabled because the
driver optimizes TRANSIENT images for tile-only storage. Without lazily
allocated memory backing, the MSAA resolve reads unbacked memory. Now we
only set TRANSIENT+LAZILY_ALLOCATED when the device actually exposes that
memory type.
- vk_context: name FNV-1a hash constants (kFnv1aOffsetBasis/kFnv1aPrime)
with why-comment on algorithm choice for sampler cache
- transport_manager: collapse redundant if/else that both set
looping=false into single unconditional assignment, add why-comment
explaining the time-closed path design
- transport_manager: hoist duplicate kMinFallbackZOffset constants out
of separate if-blocks, add why-comment on icebreaker Z clamping
- entity: expand velocity smoothing comment — explain 65/35 EMA ratio
and its tradeoff (jitter suppression vs direction change lag)
The findMemType/findMemoryType helper in auth_screen, loading_screen,
and vk_context returned 0 on failure — a valid memory type index.
Changed to return UINT32_MAX and log an error, so vkAllocateMemory
receives an invalid index and fails cleanly rather than silently
using the wrong memory type.
Add [[nodiscard]] to VkBuffer::uploadToGPU/createMapped and
VkContext::initialize/recreateSwapchain so callers that ignore
failure are flagged at compile time. Suppress with (void) cast at
3 call sites where failure is non-actionable (resize best-effort).
Replace ~37 remaining C-style casts with static_cast across 16 files.
Extract named color constants (kColorRed/Green/Yellow/Gray) and dialog
window flags (kDialogFlags) in game_screen.cpp, replacing 72 inline
literals. Normalize keybinding_manager.hpp to #pragma once.
Request 2 queues from the graphics family when available (NVIDIA
exposes 16, AMD 2+). Upload batches now submit to queue[1] while
rendering uses queue[0], enabling parallel GPU transfers without
queue-family ownership transfer barriers (same family).
Falls back to single-queue path on GPUs with only 1 queue in the
graphics family. Transfer command pool is separate to avoid contention.
Validation layers revealed 9965 VkSamplers allocated against a device
limit of 4000 — every VkTexture created its own sampler even when
configurations were identical. This exhausted NVIDIA's sampler pool
and caused intermittent SIGSEGV in vkCmdBeginRenderPass.
Add a thread-safe sampler cache in VkContext that deduplicates samplers
by FNV-1a hash of all 14 VkSamplerCreateInfo fields. All texture,
render target, renderer, water, and loading screen sampler creation
now goes through getOrCreateSampler(). Textures set ownsSampler_=false
so shared samplers aren't double-freed.
Also auto-disable anisotropy in the cache when the physical device
doesn't support the samplerAnisotropy feature, fixing the validation
error VUID-VkSamplerCreateInfo-anisotropyEnable-01070.
VkPipelineCache causes vkCmdBeginRenderPass to SIGSEGV inside
libnvidia-glcore.so on NVIDIA 590.x drivers. Skip pipeline cache
creation on NVIDIA GPUs — NVIDIA drivers already provide built-in
shader disk caching, so the Vulkan-level cache is redundant.
Pipeline cache still works on AMD and other vendors.
Log GPU name and vendor ID during VkContext initialization for easier
debugging of GPU-specific issues (FSR3, driver compat, etc.). Add
isAmdGpu()/isNvidiaGpu() accessors.
Temporarily log SMSG_PLAY_SOUND and SMSG_PLAY_OBJECT_SOUND at WARN
level (sound ID, name, file path) to diagnose unidentified ambient
NPC sounds reported by the user.
Create a VkPipelineCache at device init, loaded from disk if available.
All 65 pipeline creation calls across 19 renderer files now use the
shared cache. On shutdown, the cache is serialized to disk so subsequent
launches skip redundant shader compilation.
Cache path: ~/.local/share/wowee/pipeline_cache.bin (Linux),
~/Library/Caches/wowee/ (macOS), %APPDATA%\wowee\ (Windows).
Stale/corrupt caches are handled gracefully (fallback to empty cache).
- Animation stutter: skip playAnimation(Run) for the local player in the
server movement callback — the player renderer state machine already manages
it; resetting animTime on every movement packet caused visible stutter
- Resolution crash: reorder swapchain recreation so old swapchain is only
destroyed after confirming the new build succeeded; add null-swapchain
guard in beginFrame to survive the retry window
- Memory cap: reduce cache budget from 80% uncapped to 50% hard-capped at
16 GB to prevent excessive RAM use on high-memory systems
- Spell tooltip: suppress "Drag to action bar / Double-click to cast" hints
when the tooltip is shown from the action bar (showUsageHints=false)
- M2 collision: add watermelon/melon/squash/gourd to foliage (no-collision);
exclude chair/bench/stool/seat/throne from smallSolidProp so invisible chair
bounding boxes no longer trap the player
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.
Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.
Terrain streaming stalls: 1576ms → 124ms worst case.
Every uploadBuffer/VkTexture::upload called immediateSubmit which did a
separate vkQueueSubmit + vkWaitForFences. Loading a single creature model
with textures caused 4-8+ fence waits; terrain chunks caused 80+ per batch.
Added beginUploadBatch/endUploadBatch to VkContext: records all upload
commands into a single command buffer, submits once with one fence wait.
Staging buffers are deferred for cleanup after the batch completes.
Wrapped in batch mode:
- CharacterRenderer::loadModel (creature VB/IB + textures)
- M2Renderer::loadModel (doodad VB/IB + textures)
- TerrainRenderer::loadTerrain/loadTerrainIncremental (chunk geometry + textures)
- TerrainRenderer::uploadPreloadedTextures
- WMORenderer::loadModel (group geometry + textures)
Root cause: LOGIN_VERIFY_WORLD path did not set areaTriggerCheckTimer_ or
areaTriggerSuppressFirst_, so the Stockades exit portal (AT 503) fired
immediately on login, teleporting the player back to Stormwind and crashing
the GPU during the unexpected map transition.
Fixes:
- Set 5s area trigger cooldown + suppress-first in handleLoginVerifyWorld
(same as SMSG_NEW_WORLD handler already did for teleports)
- Add deviceLost_ flag to VkContext so beginFrame returns immediately once
VK_ERROR_DEVICE_LOST is detected, preventing infinite retry loops
- Track device lost from both fence wait and queue submit paths
- Fix shutdown hang: skip vmaDestroyAllocator (walked thousands of allocations),
replace unsafe pthread_timedjoin_np with plain join + early-exit checks in workers
- Bank window: full icon rendering, click-and-hold pickup (0.10s), drag-drop for
all bank slots including bank bag equip slots, same-slot drop detection
- Loading screen: process one tile per frame for live progress updates
- Camera reset: trust server position in online mode to avoid spawning under WMOs
- Fix PLAYER_BYTES/PLAYER_BYTES_2 field indices, preserve purchasedBankBagSlots
across inventory rebuilds, fix bank slot purchase result codes
Replace all glGenTextures/glTexImage2D calls in UI code with Vulkan texture
uploads via new VkContext::uploadImGuiTexture() helper. This fixes segfaults
from calling OpenGL functions without a GL context (null GLEW function pointers).
- Add uploadImGuiTexture() to VkContext with staging buffer upload pattern
- Convert game_screen, inventory_screen, spellbook_screen, talent_screen
from GLuint/GL calls to VkDescriptorSet/Vulkan uploads
- Fix loading_screen clearValueCount (was 1, needs 2 or 3 for MSAA)
- Add error handling: revert to 1x if recreateSwapchain fails
- Clamp requested MSAA to device maximum before applying
- Retry MSAA color image allocation without TRANSIENT on failure
- Remove redundant vkDeviceWaitIdle from WMO/M2/Character recreatePipelines
(caller already waits once, was causing ~13 stalls instead of 1)