- Selective neighborhood clamp: only modify history when there's actual
motion or disocclusion — static pixels pass history through untouched,
preventing jitter-chasing from the shifting variance box
- Tonemapped accumulation: Reinhard tonemap before blend compresses bright
edges so they don't disproportionately cause jitter
- Jitter-aware sample weighting: blend 3-20% based on sample proximity
- Soft MV dead zone: smoothstep instead of step avoids spatial discontinuity
- Aggressive velocity response: 30%/px motion, 50% cap, 80% disocclusion
- Terrain loading: radius 3 (49 tiles) to prevent spawn hitches,
processOneReadyTile for smooth progress bar updates
FSR2 temporal upscaling:
- De-jitter scene color sampling (outUV - jitterUV) for frame-to-frame
consistency, eliminating the primary source of temporal jitter
- Remove luminance instability dampening (was causing excessive blur)
- Simplify to uniform 8% blend (de-jittered values are consistent)
- Gamma 2.0 for moderate neighborhood clamping
- Motion vector dead zone: zero sub-0.01px motion from float precision noise
Loading screen:
- Reduce tile load radius from 3 to 2 (25 tiles) for faster loading
- Process one tile per iteration for smooth progress bar updates
- Motion shader: zero out sub-0.01px motion to eliminate float precision
noise in reprojection (distant geometry with large world coords)
- Accumulate: tighten neighborhood clamp gamma 3.0→1.5 to catch slightly
misaligned history causing ghost doubles
- Reduce max jitter-aware blend 30%→20% for less visible oscillation
- Add luminance instability dampening: reduce blend when current frame
disagrees with history to prevent shimmer on small/distant features
Adds a wowee-git PKGBUILD suitable for submission to the Arch User
Repository. Tracks the main branch HEAD; pkgver is auto-generated from
the commit count + short hash so no manual bumping is needed on new
releases.
Key design decisions:
- Real binaries installed to /usr/lib/wowee/ to avoid PATH clutter
- /usr/bin/wowee wrapper sets WOW_DATA_PATH to the user's XDG data dir
(~/.local/share/wowee/Data) so the asset path works without any
user configuration
- /usr/bin/wowee-extract-assets helper runs asset_extract pointed at
the same XDG data dir; users run this once against their WoW client
- Submodules (imgui, vk-bootstrap) fetched from local git mirrors
during prepare() as required by AUR source array rules
- vulkan-headers listed as makedepend (required by imgui Vulkan backend
and vk-bootstrap at compile time; not needed at runtime)
Note: stormlib is an AUR dependency (aur/stormlib). Users will need
an AUR helper (yay, paru) to install it, or install it manually first.
vulkan-headers provides <vulkan/vulkan.h> which is required at compile
time by imgui (imgui_impl_vulkan.cpp) and vk-bootstrap. On Arch,
vulkan-devel is not a package name — the headers must be installed
explicitly via vulkan-headers.
Also replace vulkan-devel with the correct individual packages:
vulkan-headers (build-time headers)
vulkan-icd-loader / vulkan-tools (runtime + utilities)
Fixes build failure: fatal error: vulkan/vulkan.h: No such file or directory
- Motion vectors: single unjittered reprojection matrix (80 bytes) instead of
two jittered matrices (160 bytes), eliminating numerical instability from
jitter amplification through large world coordinates
- Sharpen pass: fix Y-flip for correct UV sampling, double-buffer descriptor
sets to avoid race with in-flight command buffers
- MSAA: auto-disable when FSR2 enabled, grey out AA setting in UI
- Accumulation: variance-based neighborhood clamping in YCoCg space,
correct history layout transitions
- Frame index: wrap at 256 for stable Halton sequence
Full FSR 2.2 pipeline with depth-based motion vector reprojection,
temporal accumulation with YCoCg neighborhood clamping, and RCAS
contrast-adaptive sharpening.
Architecture (designed for FSR 3.x frame generation readiness):
- Camera: Halton(2,3) sub-pixel jitter with unjittered projection
stored separately for motion vector computation
- Motion vectors: compute shader reconstructs world position from
depth + inverse VP, reprojects with previous frame's VP
- Temporal accumulation: compute shader blends 5-10% current frame
with 90-95% clamped history, adaptive blend for disocclusion
- History: ping-pong R16G16B16A16 buffers at display resolution
- Sharpening: RCAS fragment pass with contrast-adaptive weights
Integration:
- FSR2 replaces both FSR1 and MSAA when enabled
- Scene renders to internal resolution framebuffer (no MSAA)
- Compute passes run between scene and swapchain render passes
- Camera cut detection resets history on teleport
- Quality presets shared with FSR1 (0.50-0.77 scale factors)
- UI: "Upscaling" combo with Off/FSR 1.0/FSR 2.2 options
- Reduce max finalization steps per frame: 2→1 (normal), 8→4 (taxi)
- Reduce terrain chunk upload batch: 32→16 chunks per step
- Reduce idle M2 model upload budget: 16→6 per step
- Reduce idle WMO model upload budget: 4→2 per step
Tiles still stream in quickly but spread GPU upload work across
more frames, eliminating the frame spikes right after spawning.
Root cause: bonesDirty was a single bool shared across both
double-buffered frame indices. When bones were copied to frame 0's
SSBO and bonesDirty cleared, frame 1's newly-allocated SSBO would
contain garbage/zeros and never get populated — causing animated
M2 instances to flash invisible on alternating frames.
Fix: Make bonesDirty per-frame-index (bool[2]) so each buffer
independently tracks whether it needs bone data uploaded. When
bones are recomputed, both indices are marked dirty. When uploaded
during render, only the current frame index is cleared. New buffer
allocations in prepareRender force their frame index dirty.
Add 2-second cooldown timer before re-checking for unloaded tiles
when workers are idle, preventing excessive streamTiles() calls
that caused frame hitches right after world load.
- Re-check for unloaded tiles when workers are idle (no tile boundary needed)
- Increase M2 upload budget 4→16 and WMO 1→4 per frame when not under pressure
- Lower tree collision threshold from 40 to 6 units so large trees block movement
- Overlap M2 and character animation updates via std::async (~2-5ms saved)
- Thread-local collision scratch buffers for concurrent floor queries
- Parallel terrain/WMO/M2 floor queries in camera controller
- Seed new M2 instance bones from existing siblings to eliminate pop-in flash
- Fix shadow flicker: snap center along stable light axes instead of in view space
- Increase shadow distance default to 300 units (slider max 500)
- Normal map CPU work (luminance→blur→Sobel) moved to background threads,
main thread only does GPU upload (~1-2ms vs 15-22ms per texture)
- Load screen warmup now waits until ALL spawn/equipment/gameobject queues
are drained before transitioning (prevents naked character, NPC pop-in)
- Exit condition: min 2s + 5 consecutive empty iterations, hard cap 15s
- Equipment queue processes 8 items per warmup iteration instead of 1
- Added LoadingScreen::renderOverlay() for future world-behind-loading use
Loading screen now calls processCreatureSpawnQueue(unlimited=true) which
removes the 1-upload-per-frame cap and 2ms time budget, allowing all pending
creature models to upload to GPU in bulk. Also increases concurrent async
background loads from 4 to 16 during load screen. Replaces 40-line inline
duplicate of processAsyncCreatureResults with the shared function.
Each loadTexture call was generating a normal/height map inline (3 full-image
passes: luminance + blur + Sobel). For models with 15-20 textures this added
30-40ms to the 70ms model upload. Now deferred to a per-frame budget (2/frame
in-game, 10/frame during load screen). Models render without POM until their
normal maps are ready.
Move all DBC lookups (CharSections, ItemDisplayInfo), texture path resolution,
and BLP decoding for humanoid NPCs to background threads. Only GPU texture
uploads remain on the main thread via pre-decoded BLP cache.
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.
Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.
Terrain streaming stalls: 1576ms → 124ms worst case.
- Replace per-frame VMA alloc/free of material UBOs with a ring buffer in
CharacterRenderer (~500 allocations/frame eliminated)
- Batch all ready terrain tiles into a single GPU upload during load screen
(processAllReadyTiles instead of one-at-a-time with individual fence waits)
- Lift per-frame creature/GO spawn budgets during load screen warmup phase
- Add background world preloader: saves last world position to disk, pre-warms
AssetManager file cache with ADT files starting at app init (login screen)
so terrain workers get instant cache hits when Enter World is clicked
- Distance-filter expensive collision guard to 8-unit melee range
- Merge 3 CharacterRenderer update loops into single pass
- Time-budget instrumentation for slow update stages (>3ms threshold)
- Count-based async creature model upload budget (max 3/frame in-game)
- 1-per-frame game object spawn + per-doodad time budget for transport loading
- Use deque for creature spawn queue to avoid O(n) front-erase