Commit graph

332 commits

Author SHA1 Message Date
Kelsi
137b25f318 rendering: fix NPC movement animation IDs and remove redundant player anim
The renderer's CharAnimState machine already drives player character
animations (Run=5, Walk=4, Jump, Swim, etc.) — remove the conflicting
camera controller code added in the previous commit.

Fix creature movement animations to use the correct WoW M2 IDs:
4=Walk, 5=Run. Both the per-frame sync loop and the SMSG_MONSTER_MOVE
spline callback now use Run (5) for NPC movement.
2026-03-10 10:19:13 -07:00
Kelsi
4e137c4061 rendering: drive Run/Stand animations from actual movement state
CameraController now transitions the player character to Run (anim 4)
on movement start and back to Stand (anim 0) on stop, guarded by a
prevPlayerMoving_ flag so animation time is not reset every frame.
Death animation (anim 1) is never overridden.

Application creature sync similarly switches creature models to Run (4)
when they move between server positions and Stand (0) when they stop,
with per-guid creatureWasMoving_ tracking to avoid per-frame resets.
2026-03-10 10:06:56 -07:00
Kelsi
c8d9d6b792 rendering/game: make player model semi-transparent in ghost form
Add GhostStateCallback to GameHandler, fired when PLAYER_FLAGS_GHOST
transitions on or off in UPDATE_OBJECT / login detection. Add
setInstanceOpacity() to CharacterRenderer to directly set opacity
without disturbing fade-in state. Application wires the callback to
set opacity 0.5 on ghost entry and 1.0 on resurrect.
2026-03-10 09:57:24 -07:00
Kelsi
366321042f rendering/game: sync camera sit state from server-confirmed stand state
Add CameraController::setSitting() and call it from the StandStateCallback
so the camera blocks movement when the server confirms the player is
sitting or kneeling (stand states 1-6, 8). This prevents the player
from sliding across the ground after sitting.

Death (state 7) deliberately leaves sitting=false so the player can
still respawn/move after death without input being blocked.
2026-03-10 09:51:15 -07:00
Kelsi
b2dccca58c rendering: re-enable WMO camera collision with asymmetric smoothing
Previously disabled because the per-frame raycast caused erratic zoom
snapping at doorway transitions.  Re-enable using an asymmetrically-
smoothed collision limit: pull-in reacts quickly (τ≈60 ms) to prevent
the camera from ever visibly clipping through walls, while recovery is
slow (τ≈400 ms) so walking through a doorway zooms back out gradually
instead of snapping.

Uses wmoRenderer->raycastBoundingBoxes() which already has strict wall
filters (|normal.z|<0.20, surface-alignment check, ±0.9 height band)
to ignore floors, ramps, and arch geometry.
2026-03-10 09:13:31 -07:00
Kelsi
ea9c7e68e7 rendering,ui: sync selection circle to renderer instance position
The selection circle was positioned using the entity's game-logic
interpolator (entity->getX/Y/Z), while the actual M2 model is
positioned by CharacterRenderer's independent interpolator (moveInstanceTo).
These two systems can drift apart during movement, causing the circle
to appear under the wrong position relative to the visible model.

Fix: add CharacterRenderer::getInstancePosition / Application::getRenderPositionForGuid
and use the renderer's inst.position for XY (with footZ override for Z)
so the circle always tracks the rendered model exactly. Falls back to
the entity game-logic position when no CharacterRenderer instance exists.
2026-03-10 06:33:44 -07:00
Kelsi
edd7e5e591 Fix shadow flashing: per-frame shadow depth images and framebuffers
Single shadow depth image shared across MAX_FRAMES=2 in-flight GPU frames
caused a race: frame N's main pass reads shadow map while frame N+1's
shadow pass clears and writes it, producing visible flashing standing
still and while moving.

Fix: give each in-flight frame its own VkImage, VmaAllocation, VkImageView,
and VkFramebuffer for the shadow depth attachment. renderShadowPass() now
indexes all shadow resources by getCurrentFrame(), and layout transitions
track per-frame state in shadowDepthLayout_[frame]. Cleanup loops over
MAX_FRAMES=2. Descriptor sets already written per-frame; updated shadow
image view binding to use the matching per-frame view.
2026-03-09 22:14:32 -07:00
Kelsi
4d1be18c18 wmo: apply MOHD ambient color to interior group lighting
Read the ambient color from the MOHD chunk (BGRA uint32) and store it
on WMOModel as a normalized RGB vec3.  Pass it through ModelData into
the per-batch WMOMaterialUBO (replacing the unused pad[3] bytes, keeping
the struct at 64 bytes).  The GLSL interior branch now floors vertex
colors against the WMO ambient instead of a hardcoded 0.5, so dungeon
interiors respect the artist-specified ambient tint from the WMO root
rather than always clamping to grey.
2026-03-09 21:27:01 -07:00
Kelsi
e0d47040d3 Fix main-thread hang from terrain finalization; two-pass M2 rendering; tile streaming improvements
Hang/GPU device lost fix:
- M2_INSTANCES and WMO_INSTANCES finalization phases now create instances
  incrementally (32 per step / 4 per step) instead of all at once, eliminating
  the >1s main-thread stalls that caused GPU fence timeouts and device loss

M2 two-pass transparent rendering:
- Opaque/alpha-test batches render in pass 1, transparent/additive in pass 2
  (back-to-front sorted) to fix wing transparency showing terrain instead of
  trees — adds hasTransparentBatches flag to skip models with no transparency

Tile streaming improvements:
- Sort new load queue entries nearest-first so critical tiles load before
  distant ones during fast taxi flight
- Increase taxi load radius 6→8 tiles, unload 9→12 for better coverage

Water refraction gated on FSR:
- Disable water refraction when FSR is not active (bugged without upscaling)
- Auto-disable refraction if FSR is turned off while refraction was on
2026-03-09 20:58:49 -07:00
Kelsi
6a681bcf67 Implement terrain shadow casting in shadow depth pass
Add initializeShadow() to TerrainRenderer that creates a depth-only
shadow pipeline reusing the existing shadow.vert/frag shaders (same
path as WMO/M2/character renderers). renderShadow() draws all terrain
chunks with sphere culling against the shadow coverage radius. Wire
both init and draw calls into Renderer so terrain now casts shadows
alongside buildings and NPCs.
2026-03-09 18:34:27 -07:00
Kelsi
caea24f6ea Fix animated M2 flicker: free bone descriptor sets on instance removal
The boneDescPool_ had MAX_BONE_SETS=2048 but sets were never freed when
instances were removed (only when clear() reset the whole pool on map load).
As tiles streamed in/out, each new animated instance consumed 2 pool slots
(one per frame index) permanently. After ~1024 animated instances created
total, vkAllocateDescriptorSets began failing silently and returning
VK_NULL_HANDLE. render() skips instances with null boneSet[frameIndex],
making them invisible — appearing as per-frame flicker as the culling pass
included them but the render pass excluded them.

Fix: destroyInstanceBones() now calls vkFreeDescriptorSets() for each
non-null boneSet before destroying the bone SSBO. The pool already had
VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT set for this purpose.
Also increased MAX_BONE_SETS from 2048 to 8192 for extra headroom.
2026-03-09 18:09:33 -07:00
Kelsi
b23cf06f1c Remove dead legacy GL Texture class
texture.hpp / texture.cpp implemented an unfinished OpenGL texture loader
(loadFromFile was a TODO stub) that had no callers — the project's texture
loading is entirely handled by VkTexture (vk_texture.hpp/cpp) after the
Vulkan migration. Remove both files and their CMakeLists entries.
2026-03-09 16:07:08 -07:00
Kelsi
bae32c1823 Add FSR3 Generic API path and harden runtime diagnostics
- AmdFsr3Runtime now probes both the legacy ffxFsr3* API and the newer
  generic ffxCreateContext/ffxDispatch API; selects whichever the loaded
  runtime library exports (GenericApi takes priority fallback)
- Generic API path implements full upscale + frame-generation context
  creation, configure, dispatch, and destroy lifecycle
- dlopen error captured and surfaced in lastError_ on Linux so runtime
  initialization failures are actionable
- FSR3 runtime init failure log now includes path kind, error string,
  and loaded library path for easier debugging
- tools/generate_ffx_sdk_vk_permutations.sh added: auto-bootstraps
  missing VK permutation headers; DXC auto-downloaded on Linux/Windows
  MSYS2; macOS reads from PATH (CI installs via brew dxc)
- CMakeLists: add upscalers/include to probe include dirs, invoke
  permutation script before SDK build, scope FFX pragma/ODR warning
  suppressions to affected TUs, add runtime-copy dependency on wowee
- UI labels updated from "FSR2" → "FSR3" in settings, tuning panel,
  performance HUD, and combo boxes
- CI macOS job now installs dxc via Homebrew for permutation codegen
2026-03-09 12:51:59 -07:00
Kelsi
e916ef9bda Fix Windows frustum enum macro collision 2026-03-09 04:41:04 -07:00
Kelsi
1c7b87ee78 Remove FSR3 wrapper path and keep official Path-A runtime only 2026-03-09 04:33:05 -07:00
Kelsi
78fa10c6ba Gate bridge interop export by wrapper capability bits
Some checks are pending
Build / Build (arm64) (push) Waiting to run
Build / Build (x86-64) (push) Waiting to run
Build / Build (macOS arm64) (push) Waiting to run
Build / Build (windows-arm64) (push) Waiting to run
Build / Build (windows-x86-64) (push) Waiting to run
Security / CodeQL (C/C++) (push) Waiting to run
Security / Semgrep (push) Waiting to run
Security / Sanitizer Build (ASan/UBSan) (push) Waiting to run
2026-03-09 02:45:37 -07:00
Kelsi
40289c5f8e Add wrapper capability query and runtime capability gating 2026-03-09 02:19:11 -07:00
Kelsi
45c2ed7a64 Expose wrapper backend mode in runtime diagnostics 2026-03-09 02:15:55 -07:00
Kelsi
1bcc2c6b85 Use monotonic interop fence values for FSR3 bridge dispatch 2026-03-09 02:03:03 -07:00
Kelsi
1c7908f02d Add ABI v3 fence-value sync for DX12 bridge dispatch 2026-03-09 01:58:45 -07:00
Kelsi
53a692b9ef Add wrapper last-error API and DX12 handle-import probe 2026-03-09 01:36:26 -07:00
Kelsi
61cb2df400 Add ABI v2 external-handle plumbing for FSR3 bridge dispatch 2026-03-09 01:31:01 -07:00
Kelsi
a1c4244a27 Implement clean Path-B FSR3 wrapper ABI for framegen 2026-03-09 00:08:22 -07:00
Kelsi
93850ac6dc Add Path A/B/C FSR3 runtime detection with clear FG fallback status 2026-03-09 00:01:45 -07:00
Kelsi
5ad4b9be2d Add FSR3 frame generation runtime stats to performance HUD 2026-03-08 23:35:39 -07:00
Kelsi
aa43aa6fc8 Bridge FSR3 Vulkan framegen dispatch and route sharpen to interpolated output 2026-03-08 23:20:50 -07:00
Kelsi
538a1db866 Fix FSR3 runtime wrapper for local SDK API and real Vulkan resource dispatch 2026-03-08 23:13:08 -07:00
Kelsi
f1099f5940 Add FSR3 runtime library probing and readiness status 2026-03-08 23:03:45 -07:00
Kelsi
e1c93c47be Stage FSR3 framegen dispatch hook in frame pipeline 2026-03-08 22:57:35 -07:00
Kelsi
bdfec103ac Add persisted AMD FSR3 framegen runtime toggle plumbing 2026-03-08 22:53:21 -07:00
Kelsi
a49decd9a6 Add AMD FSR3 framegen interface probe and CI validation 2026-03-08 22:47:46 -07:00
Kelsi
94ad89c764 Set native-focused FSR defaults and reorder quality UI 2026-03-08 21:34:31 -07:00
Kelsi
eccbfb6a5f Tune FSR2 defaults and simplify jitter controls 2026-03-08 21:08:17 -07:00
Kelsi
2e71c768db Add live FSR2 motion/jitter tuning controls and HUD readout 2026-03-08 20:56:22 -07:00
Kelsi
51a8cf565f Integrate AMD FSR2 backend and document SDK bootstrap 2026-03-08 19:56:52 -07:00
Kelsi
a24ff375fb Add AMD FSR2 SDK detection and backend integration scaffolding 2026-03-08 19:33:07 -07:00
Kelsi
e2a2316038 Stabilize FSR2 path and refine temporal pipeline groundwork 2026-03-08 18:52:04 -07:00
Kelsi
e94eb7f2d1 FSR2 temporal upscaling fixes: unjittered reprojection, sharpen Y-flip, MSAA guard, descriptor double-buffering
Some checks are pending
Build / Build (arm64) (push) Waiting to run
Build / Build (x86-64) (push) Waiting to run
Build / Build (macOS arm64) (push) Waiting to run
Build / Build (windows-arm64) (push) Waiting to run
Build / Build (windows-x86-64) (push) Waiting to run
Security / CodeQL (C/C++) (push) Waiting to run
Security / Semgrep (push) Waiting to run
Security / Sanitizer Build (ASan/UBSan) (push) Waiting to run
- Motion vectors: single unjittered reprojection matrix (80 bytes) instead of
  two jittered matrices (160 bytes), eliminating numerical instability from
  jitter amplification through large world coordinates
- Sharpen pass: fix Y-flip for correct UV sampling, double-buffer descriptor
  sets to avoid race with in-flight command buffers
- MSAA: auto-disable when FSR2 enabled, grey out AA setting in UI
- Accumulation: variance-based neighborhood clamping in YCoCg space,
  correct history layout transitions
- Frame index: wrap at 256 for stable Halton sequence
2026-03-08 01:22:15 -08:00
Kelsi
52317d1edd Implement FSR 2.2 temporal upscaling
Full FSR 2.2 pipeline with depth-based motion vector reprojection,
temporal accumulation with YCoCg neighborhood clamping, and RCAS
contrast-adaptive sharpening.

Architecture (designed for FSR 3.x frame generation readiness):
- Camera: Halton(2,3) sub-pixel jitter with unjittered projection
  stored separately for motion vector computation
- Motion vectors: compute shader reconstructs world position from
  depth + inverse VP, reprojects with previous frame's VP
- Temporal accumulation: compute shader blends 5-10% current frame
  with 90-95% clamped history, adaptive blend for disocclusion
- History: ping-pong R16G16B16A16 buffers at display resolution
- Sharpening: RCAS fragment pass with contrast-adaptive weights

Integration:
- FSR2 replaces both FSR1 and MSAA when enabled
- Scene renders to internal resolution framebuffer (no MSAA)
- Compute passes run between scene and swapchain render passes
- Camera cut detection resets history on teleport
- Quality presets shared with FSR1 (0.50-0.77 scale factors)
- UI: "Upscaling" combo with Off/FSR 1.0/FSR 2.2 options
2026-03-07 23:13:01 -08:00
Kelsi
0ffeabd4ed Revert "Further reduce tile streaming aggressiveness"
This reverts commit f681a8b361.
2026-03-07 23:02:25 -08:00
Kelsi
f681a8b361 Further reduce tile streaming aggressiveness
- Load radius: 4→3 (normal), 6→5 (taxi)
- Terrain chunks per step: 16→8
- M2 models per step: 6→2 (removed idle boost)
- WMO models per step: 2→1 (removed idle boost)
- WMO doodads per step: 4→2
- All budgets now constant (no idle-vs-busy branching)
2026-03-07 22:55:02 -08:00
Kelsi
ac3c90dd75 Fix M2 animated instance flashing (deer/bird/critter pop-in)
Root cause: bonesDirty was a single bool shared across both
double-buffered frame indices. When bones were copied to frame 0's
SSBO and bonesDirty cleared, frame 1's newly-allocated SSBO would
contain garbage/zeros and never get populated — causing animated
M2 instances to flash invisible on alternating frames.

Fix: Make bonesDirty per-frame-index (bool[2]) so each buffer
independently tracks whether it needs bone data uploaded. When
bones are recomputed, both indices are marked dirty. When uploaded
during render, only the current frame index is cleared. New buffer
allocations in prepareRender force their frame index dirty.
2026-03-07 22:47:07 -08:00
Kelsi
6cf08fbaa6 Throttle proactive tile streaming to reduce post-load hitching
Add 2-second cooldown timer before re-checking for unloaded tiles
when workers are idle, preventing excessive streamTiles() calls
that caused frame hitches right after world load.
2026-03-07 22:40:07 -08:00
Kelsi
4cb03c38fe Parallel animation updates, thread-safe collision, M2 pop-in fix, shadow stabilization
- Overlap M2 and character animation updates via std::async (~2-5ms saved)
- Thread-local collision scratch buffers for concurrent floor queries
- Parallel terrain/WMO/M2 floor queries in camera controller
- Seed new M2 instance bones from existing siblings to eliminate pop-in flash
- Fix shadow flicker: snap center along stable light axes instead of in view space
- Increase shadow distance default to 300 units (slider max 500)
2026-03-07 22:29:06 -08:00
Kelsi
a4966e486f Fix WMO wall collision, normal mapping, POM backfill, and M2/WMO rendering performance
- Fix MOPY flag check (0x08 not 0x01) for proper wall collision detection
- Cap MAX_PUSH to PLAYER_RADIUS to prevent gradual clip-through
- Fix WMO doodad quaternion component ordering (X/Y swap)
- Linear normal map strength blend in shader for smooth slider control
- Enable shadow sampling for interior WMO groups (covered outdoor areas)
- Backfill deferred normal/height maps after streaming with descriptor rebind
- M2: prepareRender only iterates animated instances, bone dirty flag
- M2: remove worker thread VMA allocation, skip unready bone instances
- WMO: persistent visibility vectors, sequential culling
- Add FSR EASU/RCAS shaders
2026-03-07 22:03:28 -08:00
Kelsi
02cf0e4df3 Background normal map generation, queue-draining load screen warmup
- Normal map CPU work (luminance→blur→Sobel) moved to background threads,
  main thread only does GPU upload (~1-2ms vs 15-22ms per texture)
- Load screen warmup now waits until ALL spawn/equipment/gameobject queues
  are drained before transitioning (prevents naked character, NPC pop-in)
- Exit condition: min 2s + 5 consecutive empty iterations, hard cap 15s
- Equipment queue processes 8 items per warmup iteration instead of 1
- Added LoadingScreen::renderOverlay() for future world-behind-loading use
2026-03-07 18:40:24 -08:00
Kelsi
24f2ec75ec Defer normal map generation to reduce GPU model upload stalls by ~50%
Some checks are pending
Build / Build (arm64) (push) Waiting to run
Build / Build (x86-64) (push) Waiting to run
Build / Build (macOS arm64) (push) Waiting to run
Build / Build (windows-arm64) (push) Waiting to run
Build / Build (windows-x86-64) (push) Waiting to run
Security / CodeQL (C/C++) (push) Waiting to run
Security / Semgrep (push) Waiting to run
Security / Sanitizer Build (ASan/UBSan) (push) Waiting to run
Each loadTexture call was generating a normal/height map inline (3 full-image
passes: luminance + blur + Sobel). For models with 15-20 textures this added
30-40ms to the 70ms model upload. Now deferred to a per-frame budget (2/frame
in-game, 10/frame during load screen). Models render without POM until their
normal maps are ready.
2026-03-07 17:16:38 -08:00
Kelsi
7ac990cff4 Background BLP texture pre-decoding + deferred WMO normal maps (12x streaming perf)
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.

Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.

Terrain streaming stalls: 1576ms → 124ms worst case.
2026-03-07 15:46:56 -08:00
Kelsi
0313bd8692 Performance: ring buffer UBOs, batched load screen uploads, background world preloader
- Replace per-frame VMA alloc/free of material UBOs with a ring buffer in
  CharacterRenderer (~500 allocations/frame eliminated)
- Batch all ready terrain tiles into a single GPU upload during load screen
  (processAllReadyTiles instead of one-at-a-time with individual fence waits)
- Lift per-frame creature/GO spawn budgets during load screen warmup phase
- Add background world preloader: saves last world position to disk, pre-warms
  AssetManager file cache with ADT files starting at app init (login screen)
  so terrain workers get instant cache hits when Enter World is clicked
- Distance-filter expensive collision guard to 8-unit melee range
- Merge 3 CharacterRenderer update loops into single pass
- Time-budget instrumentation for slow update stages (>3ms threshold)
- Count-based async creature model upload budget (max 3/frame in-game)
- 1-per-frame game object spawn + per-doodad time budget for transport loading
- Use deque for creature spawn queue to avoid O(n) front-erase
2026-03-07 13:44:09 -08:00
Kelsi
25bb63c50a Faster terrain/model loading: more workers, batched finalization, skip redundant I/O
- Worker threads: use (cores - 1-2) instead of cores/2, minimum 4
- Outer upload batch in processReadyTiles: ALL model/texture uploads per
  frame share a single command buffer submission + fence wait
- Upload multiple models per finalization step: 8 M2s, 4 WMOs, 16 doodads
  per call instead of 1 each (all within same GPU batch)
- Terrain chunks: 64 per step instead of 16
- Skip redundant M2 file I/O: thread-safe uploadedM2Ids_ set lets
  background workers skip re-reading+parsing models already on GPU
- processAllReadyTiles (loading screen) and processOneReadyTile also
  wrapped in outer upload batches
2026-03-07 12:32:39 -08:00