Background BLP texture pre-decoding + deferred WMO normal maps (12x streaming perf)

Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.

Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.

Terrain streaming stalls: 1576ms → 124ms worst case.
This commit is contained in:
Kelsi 2026-03-07 15:46:56 -08:00
parent 0313bd8692
commit 7ac990cff4
13 changed files with 573 additions and 109 deletions

View file

@ -50,9 +50,12 @@ public:
// Batch upload mode: records multiple upload commands into a single
// command buffer, then submits with ONE fence wait instead of one per upload.
void beginUploadBatch();
void endUploadBatch();
void endUploadBatch(); // Async: submits but does NOT wait for fence
void endUploadBatchSync(); // Sync: submits and waits (for load screens)
bool isInUploadBatch() const { return inUploadBatch_; }
void deferStagingCleanup(AllocatedBuffer staging);
void pollUploadBatches(); // Check completed async uploads, free staging buffers
void waitAllUploads(); // Block until all in-flight uploads complete
// Accessors
VkInstance getInstance() const { return instance; }
@ -157,6 +160,14 @@ private:
VkCommandBuffer batchCmd_ = VK_NULL_HANDLE;
std::vector<AllocatedBuffer> batchStagingBuffers_;
// Async upload: in-flight batches awaiting GPU completion
struct InFlightBatch {
VkFence fence = VK_NULL_HANDLE;
VkCommandBuffer cmd = VK_NULL_HANDLE;
std::vector<AllocatedBuffer> stagingBuffers;
};
std::vector<InFlightBatch> inFlightBatches_;
// Depth buffer (shared across all framebuffers)
VkImage depthImage = VK_NULL_HANDLE;
VkImageView depthImageView = VK_NULL_HANDLE;