Compare commits

...

9 commits

Author SHA1 Message Date
Kelsi
24f2ec75ec Defer normal map generation to reduce GPU model upload stalls by ~50%
Some checks are pending
Build / Build (arm64) (push) Waiting to run
Build / Build (x86-64) (push) Waiting to run
Build / Build (macOS arm64) (push) Waiting to run
Build / Build (windows-arm64) (push) Waiting to run
Build / Build (windows-x86-64) (push) Waiting to run
Security / CodeQL (C/C++) (push) Waiting to run
Security / Semgrep (push) Waiting to run
Security / Sanitizer Build (ASan/UBSan) (push) Waiting to run
Each loadTexture call was generating a normal/height map inline (3 full-image
passes: luminance + blur + Sobel). For models with 15-20 textures this added
30-40ms to the 70ms model upload. Now deferred to a per-frame budget (2/frame
in-game, 10/frame during load screen). Models render without POM until their
normal maps are ready.
2026-03-07 17:16:38 -08:00
Kelsi
faca22ac5f Async humanoid NPC texture pipeline to eliminate 30-150ms main-thread stalls
Move all DBC lookups (CharSections, ItemDisplayInfo), texture path resolution,
and BLP decoding for humanoid NPCs to background threads. Only GPU texture
uploads remain on the main thread via pre-decoded BLP cache.
2026-03-07 16:54:58 -08:00
Kelsi
7ac990cff4 Background BLP texture pre-decoding + deferred WMO normal maps (12x streaming perf)
Move CPU-heavy BLP texture decoding from main thread to background worker
threads for all hot paths: terrain M2 models, WMO doodad M2s, WMO textures,
creature models, and gameobject WMOs. Each renderer (M2, WMO, Character) now
accepts a pre-decoded BLP cache that loadTexture() checks before falling back
to synchronous decode.

Defer WMO normal/height map generation (3 per-pixel passes: luminance, box
blur, Sobel) during terrain streaming finalization — this was the dominant
remaining bottleneck after BLP pre-decoding.

Terrain streaming stalls: 1576ms → 124ms worst case.
2026-03-07 15:46:56 -08:00
Kelsi
0313bd8692 Performance: ring buffer UBOs, batched load screen uploads, background world preloader
- Replace per-frame VMA alloc/free of material UBOs with a ring buffer in
  CharacterRenderer (~500 allocations/frame eliminated)
- Batch all ready terrain tiles into a single GPU upload during load screen
  (processAllReadyTiles instead of one-at-a-time with individual fence waits)
- Lift per-frame creature/GO spawn budgets during load screen warmup phase
- Add background world preloader: saves last world position to disk, pre-warms
  AssetManager file cache with ADT files starting at app init (login screen)
  so terrain workers get instant cache hits when Enter World is clicked
- Distance-filter expensive collision guard to 8-unit melee range
- Merge 3 CharacterRenderer update loops into single pass
- Time-budget instrumentation for slow update stages (>3ms threshold)
- Count-based async creature model upload budget (max 3/frame in-game)
- 1-per-frame game object spawn + per-doodad time budget for transport loading
- Use deque for creature spawn queue to avoid O(n) front-erase
2026-03-07 13:44:09 -08:00
Kelsi
71e8ed5b7d Reduce initial load to radius 1 (~5 tiles) for fast game entry
Was waiting for all ~50 tiles (radius 4) to fully prepare + finalize
before entering the game. Now loads only the immediate surrounding tiles
during the loading screen, then restores the full radius for in-game
streaming. setLoadRadius just sets an int — actual loading happens lazily
via background workers during the game loop.
2026-03-07 12:39:38 -08:00
Kelsi
25bb63c50a Faster terrain/model loading: more workers, batched finalization, skip redundant I/O
- Worker threads: use (cores - 1-2) instead of cores/2, minimum 4
- Outer upload batch in processReadyTiles: ALL model/texture uploads per
  frame share a single command buffer submission + fence wait
- Upload multiple models per finalization step: 8 M2s, 4 WMOs, 16 doodads
  per call instead of 1 each (all within same GPU batch)
- Terrain chunks: 64 per step instead of 16
- Skip redundant M2 file I/O: thread-safe uploadedM2Ids_ set lets
  background workers skip re-reading+parsing models already on GPU
- processAllReadyTiles (loading screen) and processOneReadyTile also
  wrapped in outer upload batches
2026-03-07 12:32:39 -08:00
Kelsi
16b4336700 Batch GPU uploads to eliminate per-upload fence waits (stutter fix)
Every uploadBuffer/VkTexture::upload called immediateSubmit which did a
separate vkQueueSubmit + vkWaitForFences. Loading a single creature model
with textures caused 4-8+ fence waits; terrain chunks caused 80+ per batch.

Added beginUploadBatch/endUploadBatch to VkContext: records all upload
commands into a single command buffer, submits once with one fence wait.
Staging buffers are deferred for cleanup after the batch completes.

Wrapped in batch mode:
- CharacterRenderer::loadModel (creature VB/IB + textures)
- M2Renderer::loadModel (doodad VB/IB + textures)
- TerrainRenderer::loadTerrain/loadTerrainIncremental (chunk geometry + textures)
- TerrainRenderer::uploadPreloadedTextures
- WMORenderer::loadModel (group geometry + textures)
2026-03-07 12:19:59 -08:00
Kelsi
884b72bc1c Incremental terrain upload + M2 instance dedup hash for city stutter
Terrain finalization was uploading all 256 chunks (GPU fence waits) in one
atomic advanceFinalization call that couldn't be interrupted by the 5ms time
budget. Now split into incremental batches of 16 chunks per call, allowing
the time budget to yield between batches.

M2 instance creation had O(N) dedup scans iterating ALL instances to check
for duplicates. In cities with 5000+ doodads, this caused O(N²) total work
during tile loading. Replaced with hash-based DedupKey map for O(1) lookups.

Changes:
- TerrainRenderer::loadTerrainIncremental: uploads N chunks per call
- FinalizingTile tracks terrainChunkNext for cross-frame progress
- TERRAIN phase yields after preload and after each chunk batch
- M2Renderer::DedupKey hash map replaces linear scan in createInstance
  and createInstanceWithMatrix
- Dedup map maintained through rebuildSpatialIndex and clear paths
2026-03-07 11:59:19 -08:00
Kelsi
f9410cc4bd Fix city NPC stuttering: async model loading, CharSections cache, frame budgets
- Async creature model loading: M2 file I/O and parsing on background threads
  via std::async, GPU upload on main thread when ready (MAX_ASYNC_CREATURE_LOADS=4)
- CharSections.dbc lookup cache: O(1) hash lookup instead of O(N) full DBC scan
  per humanoid NPC spawn (was scanning thousands of records twice per spawn)
- Frame time budget: 4ms cap on creature spawn processing per frame
- Wolf/worg model name check cached per modelId (was doing tolower+find per
  hostile creature per frame)
- Weapon attach throttle: max 2 per 1s tick (was attempting all unweaponized NPCs)
- Separate texture application tracking (displayIdTexturesApplied_) so async-loaded
  models still get skin/equipment textures applied correctly
2026-03-07 11:44:14 -08:00
18 changed files with 2460 additions and 707 deletions

View file

@ -3,13 +3,19 @@
#include "core/window.hpp" #include "core/window.hpp"
#include "core/input.hpp" #include "core/input.hpp"
#include "game/character.hpp" #include "game/character.hpp"
#include "pipeline/blp_loader.hpp"
#include <memory> #include <memory>
#include <string> #include <string>
#include <vector> #include <vector>
#include <deque>
#include <unordered_map> #include <unordered_map>
#include <unordered_set> #include <unordered_set>
#include <array> #include <array>
#include <optional> #include <optional>
#include <future>
#include <mutex>
#include <thread>
#include <atomic>
namespace wowee { namespace wowee {
@ -18,7 +24,7 @@ namespace rendering { class Renderer; }
namespace ui { class UIManager; } namespace ui { class UIManager; }
namespace auth { class AuthHandler; } namespace auth { class AuthHandler; }
namespace game { class GameHandler; class World; class ExpansionRegistry; } namespace game { class GameHandler; class World; class ExpansionRegistry; }
namespace pipeline { class AssetManager; class DBCLayout; } namespace pipeline { class AssetManager; class DBCLayout; struct M2Model; struct WMOModel; }
namespace audio { enum class VoiceType; } namespace audio { enum class VoiceType; }
namespace core { namespace core {
@ -90,6 +96,7 @@ private:
static const char* mapIdToName(uint32_t mapId); static const char* mapIdToName(uint32_t mapId);
void loadOnlineWorldTerrain(uint32_t mapId, float x, float y, float z); void loadOnlineWorldTerrain(uint32_t mapId, float x, float y, float z);
void buildFactionHostilityMap(uint8_t playerRace); void buildFactionHostilityMap(uint8_t playerRace);
pipeline::M2Model loadCreatureM2Sync(const std::string& m2Path);
void spawnOnlineCreature(uint64_t guid, uint32_t displayId, float x, float y, float z, float orientation); void spawnOnlineCreature(uint64_t guid, uint32_t displayId, float x, float y, float z, float orientation);
void despawnOnlineCreature(uint64_t guid); void despawnOnlineCreature(uint64_t guid);
bool tryAttachCreatureVirtualWeapons(uint64_t guid, uint32_t instanceId); bool tryAttachCreatureVirtualWeapons(uint64_t guid, uint32_t instanceId);
@ -181,8 +188,39 @@ private:
std::unordered_map<uint64_t, glm::vec3> creatureRenderPosCache_; // guid -> last synced render position std::unordered_map<uint64_t, glm::vec3> creatureRenderPosCache_; // guid -> last synced render position
std::unordered_set<uint64_t> creatureWeaponsAttached_; // guid set when NPC virtual weapons attached std::unordered_set<uint64_t> creatureWeaponsAttached_; // guid set when NPC virtual weapons attached
std::unordered_map<uint64_t, uint8_t> creatureWeaponAttachAttempts_; // guid -> attach attempts std::unordered_map<uint64_t, uint8_t> creatureWeaponAttachAttempts_; // guid -> attach attempts
std::unordered_map<uint32_t, bool> modelIdIsWolfLike_; // modelId → cached wolf/worg check
static constexpr int MAX_WEAPON_ATTACHES_PER_TICK = 2; // limit weapon attach work per 1s tick
// CharSections.dbc lookup cache to avoid O(N) DBC scan per NPC spawn.
// Key: (race<<24)|(sex<<16)|(section<<12)|(variation<<8)|color → texture path
std::unordered_map<uint64_t, std::string> charSectionsCache_;
bool charSectionsCacheBuilt_ = false;
void buildCharSectionsCache();
std::string lookupCharSection(uint8_t race, uint8_t sex, uint8_t section,
uint8_t variation, uint8_t color, int texIndex = 0) const;
// Async creature model loading: file I/O + M2 parsing on background thread,
// GPU upload + instance creation on main thread.
struct PreparedCreatureModel {
uint64_t guid;
uint32_t displayId;
uint32_t modelId;
float x, y, z, orientation;
std::shared_ptr<pipeline::M2Model> model; // parsed on background thread
std::unordered_map<std::string, pipeline::BLPImage> predecodedTextures; // decoded on bg thread
bool valid = false;
bool permanent_failure = false;
};
struct AsyncCreatureLoad {
std::future<PreparedCreatureModel> future;
};
std::vector<AsyncCreatureLoad> asyncCreatureLoads_;
void processAsyncCreatureResults();
static constexpr int MAX_ASYNC_CREATURE_LOADS = 4; // concurrent background loads
std::unordered_set<uint64_t> deadCreatureGuids_; // GUIDs that should spawn in corpse/death pose std::unordered_set<uint64_t> deadCreatureGuids_; // GUIDs that should spawn in corpse/death pose
std::unordered_map<uint32_t, uint32_t> displayIdModelCache_; // displayId → modelId (model caching) std::unordered_map<uint32_t, uint32_t> displayIdModelCache_; // displayId → modelId (model caching)
std::unordered_set<uint32_t> displayIdTexturesApplied_; // displayIds with per-model textures applied
std::unordered_map<uint32_t, std::unordered_map<std::string, pipeline::BLPImage>> displayIdPredecodedTextures_; // displayId → pre-decoded skin textures
mutable std::unordered_set<uint32_t> warnedMissingDisplayDataIds_; // displayIds already warned mutable std::unordered_set<uint32_t> warnedMissingDisplayDataIds_; // displayIds already warned
mutable std::unordered_set<uint32_t> warnedMissingModelPathIds_; // modelIds/displayIds already warned mutable std::unordered_set<uint32_t> warnedMissingModelPathIds_; // modelIds/displayIds already warned
uint32_t nextCreatureModelId_ = 5000; // Model IDs for online creatures uint32_t nextCreatureModelId_ = 5000; // Model IDs for online creatures
@ -250,7 +288,7 @@ private:
uint32_t displayId; uint32_t displayId;
float x, y, z, orientation; float x, y, z, orientation;
}; };
std::vector<PendingCreatureSpawn> pendingCreatureSpawns_; std::deque<PendingCreatureSpawn> pendingCreatureSpawns_;
static constexpr int MAX_SPAWNS_PER_FRAME = 3; static constexpr int MAX_SPAWNS_PER_FRAME = 3;
static constexpr int MAX_NEW_CREATURE_MODELS_PER_FRAME = 1; static constexpr int MAX_NEW_CREATURE_MODELS_PER_FRAME = 1;
static constexpr uint16_t MAX_CREATURE_SPAWN_RETRIES = 300; static constexpr uint16_t MAX_CREATURE_SPAWN_RETRIES = 300;
@ -275,6 +313,49 @@ private:
// Deferred equipment compositing queue — processes max 1 per frame to avoid stutter // Deferred equipment compositing queue — processes max 1 per frame to avoid stutter
std::vector<std::pair<uint64_t, std::pair<std::array<uint32_t, 19>, std::array<uint8_t, 19>>>> deferredEquipmentQueue_; std::vector<std::pair<uint64_t, std::pair<std::array<uint32_t, 19>, std::array<uint8_t, 19>>>> deferredEquipmentQueue_;
void processDeferredEquipmentQueue(); void processDeferredEquipmentQueue();
// Async equipment texture pre-decode: BLP decode on background thread, composite on main thread
struct PreparedEquipmentUpdate {
uint64_t guid;
std::array<uint32_t, 19> displayInfoIds;
std::array<uint8_t, 19> inventoryTypes;
std::unordered_map<std::string, pipeline::BLPImage> predecodedTextures;
};
struct AsyncEquipmentLoad {
std::future<PreparedEquipmentUpdate> future;
};
std::vector<AsyncEquipmentLoad> asyncEquipmentLoads_;
void processAsyncEquipmentResults();
std::vector<std::string> resolveEquipmentTexturePaths(uint64_t guid,
const std::array<uint32_t, 19>& displayInfoIds,
const std::array<uint8_t, 19>& inventoryTypes) const;
// Deferred NPC texture setup — async DBC lookups + BLP pre-decode to avoid main-thread stalls
struct DeferredNpcComposite {
uint32_t modelId;
uint32_t displayId;
// Skin compositing (type-1 slots)
std::string basePath; // CharSections skin base texture
std::vector<std::string> overlayPaths; // face + underwear overlays
std::vector<std::pair<int, std::string>> regionLayers; // equipment region overlays
std::vector<uint32_t> skinTextureSlots; // model texture slots needing skin composite
bool hasComposite = false; // needs compositing (overlays or equipment regions)
bool hasSimpleSkin = false; // just base skin, no compositing needed
// Baked skin (type-1 slots)
std::string bakedSkinPath; // baked texture path (if available)
bool hasBakedSkin = false; // baked skin resolved successfully
// Hair (type-6 slots)
std::vector<uint32_t> hairTextureSlots; // model texture slots needing hair texture
std::string hairTexturePath; // resolved hair texture path
bool useBakedForHair = false; // bald NPC: use baked skin for type-6
};
struct PreparedNpcComposite {
DeferredNpcComposite info;
std::unordered_map<std::string, pipeline::BLPImage> predecodedTextures;
};
struct AsyncNpcCompositeLoad {
std::future<PreparedNpcComposite> future;
};
std::vector<AsyncNpcCompositeLoad> asyncNpcCompositeLoads_;
void processAsyncNpcCompositeResults();
// Cache base player model geometry by (raceId, genderId) // Cache base player model geometry by (raceId, genderId)
std::unordered_map<uint32_t, uint32_t> playerModelCache_; // key=(race<<8)|gender → modelId std::unordered_map<uint32_t, uint32_t> playerModelCache_; // key=(race<<8)|gender → modelId
struct PlayerTextureSlots { int skin = -1; int hair = -1; int underwear = -1; }; struct PlayerTextureSlots { int skin = -1; int hair = -1; int underwear = -1; };
@ -302,6 +383,24 @@ private:
}; };
std::vector<PendingGameObjectSpawn> pendingGameObjectSpawns_; std::vector<PendingGameObjectSpawn> pendingGameObjectSpawns_;
void processGameObjectSpawnQueue(); void processGameObjectSpawnQueue();
// Async WMO loading for game objects (file I/O + parse on background thread)
struct PreparedGameObjectWMO {
uint64_t guid;
uint32_t entry;
uint32_t displayId;
float x, y, z, orientation;
std::shared_ptr<pipeline::WMOModel> wmoModel;
std::unordered_map<std::string, pipeline::BLPImage> predecodedTextures; // decoded on bg thread
bool valid = false;
bool isWmo = false;
std::string modelPath;
};
struct AsyncGameObjectLoad {
std::future<PreparedGameObjectWMO> future;
};
std::vector<AsyncGameObjectLoad> asyncGameObjectLoads_;
void processAsyncGameObjectResults();
struct PendingTransportDoodadBatch { struct PendingTransportDoodadBatch {
uint64_t guid = 0; uint64_t guid = 0;
uint32_t modelId = 0; uint32_t modelId = 0;
@ -321,6 +420,23 @@ private:
// Quest marker billboard sprites (above NPCs) // Quest marker billboard sprites (above NPCs)
void loadQuestMarkerModels(); // Now loads BLP textures void loadQuestMarkerModels(); // Now loads BLP textures
void updateQuestMarkers(); // Updates billboard positions void updateQuestMarkers(); // Updates billboard positions
// Background world preloader — warms AssetManager file cache for the
// expected world before the user clicks Enter World.
struct WorldPreload {
uint32_t mapId = 0;
std::string mapName;
int centerTileX = 0;
int centerTileY = 0;
std::atomic<bool> cancel{false};
std::vector<std::thread> workers;
};
std::unique_ptr<WorldPreload> worldPreload_;
void startWorldPreload(uint32_t mapId, const std::string& mapName, float serverX, float serverY);
void cancelWorldPreload();
void saveLastWorldInfo(uint32_t mapId, const std::string& mapName, float serverX, float serverY);
struct LastWorldInfo { uint32_t mapId = 0; std::string mapName; float x = 0, y = 0; bool valid = false; };
LastWorldInfo loadLastWorldInfo() const;
}; };
} // namespace core } // namespace core

View file

@ -1,6 +1,7 @@
#pragma once #pragma once
#include "pipeline/m2_loader.hpp" #include "pipeline/m2_loader.hpp"
#include "pipeline/blp_loader.hpp"
#include <vulkan/vulkan.h> #include <vulkan/vulkan.h>
#include <vk_mem_alloc.h> #include <vk_mem_alloc.h>
#include <glm/glm.hpp> #include <glm/glm.hpp>
@ -11,6 +12,7 @@
#include <string> #include <string>
#include <utility> #include <utility>
#include <future> #include <future>
#include <deque>
namespace wowee { namespace wowee {
namespace pipeline { class AssetManager; } namespace pipeline { class AssetManager; }
@ -114,7 +116,11 @@ public:
void setShadowMap(VkTexture*, const glm::mat4&) {} void setShadowMap(VkTexture*, const glm::mat4&) {}
void clearShadowMap() {} void clearShadowMap() {}
// Pre-decoded BLP cache: set before calling loadModel() to skip main-thread BLP decode
void setPredecodedBLPCache(std::unordered_map<std::string, pipeline::BLPImage>* cache) { predecodedBLPCache_ = cache; }
private: private:
std::unordered_map<std::string, pipeline::BLPImage>* predecodedBLPCache_ = nullptr;
// GPU representation of M2 model // GPU representation of M2 model
struct M2ModelGPU { struct M2ModelGPU {
VkBuffer vertexBuffer = VK_NULL_HANDLE; VkBuffer vertexBuffer = VK_NULL_HANDLE;
@ -180,6 +186,7 @@ private:
// Bone update throttling (skip frames for distant characters) // Bone update throttling (skip frames for distant characters)
uint32_t boneUpdateCounter = 0; uint32_t boneUpdateCounter = 0;
const M2ModelGPU* cachedModel = nullptr; // Avoid per-frame hash lookups
// Per-instance bone SSBO (double-buffered per frame) // Per-instance bone SSBO (double-buffered per frame)
VkBuffer boneBuffer[2] = {}; VkBuffer boneBuffer[2] = {};
@ -254,7 +261,14 @@ private:
VkDescriptorPool materialDescPools_[2] = {VK_NULL_HANDLE, VK_NULL_HANDLE}; VkDescriptorPool materialDescPools_[2] = {VK_NULL_HANDLE, VK_NULL_HANDLE};
VkDescriptorPool boneDescPool_ = VK_NULL_HANDLE; VkDescriptorPool boneDescPool_ = VK_NULL_HANDLE;
uint32_t lastMaterialPoolResetFrame_ = 0xFFFFFFFFu; uint32_t lastMaterialPoolResetFrame_ = 0xFFFFFFFFu;
std::vector<std::pair<VkBuffer, VmaAllocation>> transientMaterialUbos_[2];
// Material UBO ring buffer — pre-allocated per frame slot, sub-allocated each draw
VkBuffer materialRingBuffer_[2] = {VK_NULL_HANDLE, VK_NULL_HANDLE};
VmaAllocation materialRingAlloc_[2] = {VK_NULL_HANDLE, VK_NULL_HANDLE};
void* materialRingMapped_[2] = {nullptr, nullptr};
uint32_t materialRingOffset_[2] = {0, 0};
uint32_t materialUboAlignment_ = 256; // minUniformBufferOffsetAlignment
static constexpr uint32_t MATERIAL_RING_CAPACITY = 4096;
// Texture cache // Texture cache
struct TextureCacheEntry { struct TextureCacheEntry {
@ -265,6 +279,7 @@ private:
uint64_t lastUse = 0; uint64_t lastUse = 0;
bool hasAlpha = false; bool hasAlpha = false;
bool colorKeyBlack = false; bool colorKeyBlack = false;
bool normalMapPending = false; // deferred normal map generation
}; };
std::unordered_map<std::string, TextureCacheEntry> textureCache; std::unordered_map<std::string, TextureCacheEntry> textureCache;
std::unordered_map<VkTexture*, bool> textureHasAlphaByPtr_; std::unordered_map<VkTexture*, bool> textureHasAlphaByPtr_;
@ -289,6 +304,17 @@ private:
std::unique_ptr<VkTexture> generateNormalHeightMap( std::unique_ptr<VkTexture> generateNormalHeightMap(
const uint8_t* pixels, uint32_t width, uint32_t height, float& outVariance); const uint8_t* pixels, uint32_t width, uint32_t height, float& outVariance);
// Deferred normal map generation — avoids stalling loadModel
struct PendingNormalMap {
std::string cacheKey;
std::vector<uint8_t> pixels; // RGBA pixel data
uint32_t width, height;
};
std::deque<PendingNormalMap> pendingNormalMaps_;
public:
void processPendingNormalMaps(int budget = 2);
private:
// Normal mapping / POM settings // Normal mapping / POM settings
bool normalMappingEnabled_ = true; bool normalMappingEnabled_ = true;
float normalMapStrength_ = 0.8f; float normalMapStrength_ = 0.8f;

View file

@ -1,6 +1,7 @@
#pragma once #pragma once
#include "pipeline/m2_loader.hpp" #include "pipeline/m2_loader.hpp"
#include "pipeline/blp_loader.hpp"
#include <vulkan/vulkan.h> #include <vulkan/vulkan.h>
#include <vk_mem_alloc.h> #include <vk_mem_alloc.h>
#include <glm/glm.hpp> #include <glm/glm.hpp>
@ -188,6 +189,7 @@ struct M2Instance {
bool skipCollision = false; // WMO interior doodads — skip player wall collision bool skipCollision = false; // WMO interior doodads — skip player wall collision
float cachedBoundRadius = 0.0f; float cachedBoundRadius = 0.0f;
float portalSpinAngle = 0.0f; // Accumulated spin angle for portal rotation float portalSpinAngle = 0.0f; // Accumulated spin angle for portal rotation
const M2ModelGPU* cachedModel = nullptr; // Avoid per-frame hash lookups
// Frame-skip optimization (update distant animations less frequently) // Frame-skip optimization (update distant animations less frequently)
uint8_t frameSkipCounter = 0; uint8_t frameSkipCounter = 0;
@ -328,6 +330,10 @@ public:
std::vector<glm::vec3> getWaterVegetationPositions(const glm::vec3& camPos, float maxDist) const; std::vector<glm::vec3> getWaterVegetationPositions(const glm::vec3& camPos, float maxDist) const;
// Pre-decoded BLP cache: set by terrain manager before calling loadModel()
// so loadTexture() can skip the expensive assetManager->loadTexture() call.
void setPredecodedBLPCache(std::unordered_map<std::string, pipeline::BLPImage>* cache) { predecodedBLPCache_ = cache; }
private: private:
bool initialized_ = false; bool initialized_ = false;
bool insideInterior = false; bool insideInterior = false;
@ -389,12 +395,33 @@ private:
std::unordered_map<uint32_t, M2ModelGPU> models; std::unordered_map<uint32_t, M2ModelGPU> models;
std::vector<M2Instance> instances; std::vector<M2Instance> instances;
// O(1) dedup: key = (modelId, quantized x, quantized y, quantized z) → instanceId
struct DedupKey {
uint32_t modelId;
int32_t qx, qy, qz; // position quantized to 0.1 units
bool operator==(const DedupKey& o) const {
return modelId == o.modelId && qx == o.qx && qy == o.qy && qz == o.qz;
}
};
struct DedupHash {
size_t operator()(const DedupKey& k) const {
size_t h = std::hash<uint32_t>()(k.modelId);
h ^= std::hash<int32_t>()(k.qx) * 2654435761u;
h ^= std::hash<int32_t>()(k.qy) * 40503u;
h ^= std::hash<int32_t>()(k.qz) * 12289u;
return h;
}
};
std::unordered_map<DedupKey, uint32_t, DedupHash> instanceDedupMap_;
uint32_t nextInstanceId = 1; uint32_t nextInstanceId = 1;
uint32_t lastDrawCallCount = 0; uint32_t lastDrawCallCount = 0;
size_t modelCacheLimit_ = 6000; size_t modelCacheLimit_ = 6000;
uint32_t modelLimitRejectWarnings_ = 0; uint32_t modelLimitRejectWarnings_ = 0;
VkTexture* loadTexture(const std::string& path, uint32_t texFlags = 0); VkTexture* loadTexture(const std::string& path, uint32_t texFlags = 0);
std::unordered_map<std::string, pipeline::BLPImage>* predecodedBLPCache_ = nullptr;
struct TextureCacheEntry { struct TextureCacheEntry {
std::unique_ptr<VkTexture> texture; std::unique_ptr<VkTexture> texture;
size_t approxBytes = 0; size_t approxBytes = 0;

View file

@ -121,6 +121,12 @@ struct PendingTile {
// Pre-loaded terrain texture BLP data (loaded on background thread to avoid // Pre-loaded terrain texture BLP data (loaded on background thread to avoid
// blocking file I/O on the main thread during finalizeTile) // blocking file I/O on the main thread during finalizeTile)
std::unordered_map<std::string, pipeline::BLPImage> preloadedTextures; std::unordered_map<std::string, pipeline::BLPImage> preloadedTextures;
// Pre-decoded M2 model textures (decoded on background thread)
std::unordered_map<std::string, pipeline::BLPImage> preloadedM2Textures;
// Pre-decoded WMO textures (decoded on background thread)
std::unordered_map<std::string, pipeline::BLPImage> preloadedWMOTextures;
}; };
/** /**
@ -150,6 +156,11 @@ struct FinalizingTile {
size_t wmoModelIndex = 0; // Next WMO model to upload size_t wmoModelIndex = 0; // Next WMO model to upload
size_t wmoDoodadIndex = 0; // Next WMO doodad to upload size_t wmoDoodadIndex = 0; // Next WMO doodad to upload
// Incremental terrain upload state (splits TERRAIN phase across frames)
bool terrainPreloaded = false; // True after preloaded textures uploaded
int terrainChunkNext = 0; // Next chunk index to upload (0-255, row-major)
bool terrainMeshDone = false; // True when all chunks uploaded
// Accumulated results (built up across phases) // Accumulated results (built up across phases)
std::vector<uint32_t> m2InstanceIds; std::vector<uint32_t> m2InstanceIds;
std::vector<uint32_t> wmoInstanceIds; std::vector<uint32_t> wmoInstanceIds;
@ -376,6 +387,11 @@ private:
std::unordered_set<std::string> missingAdtWarnings_; std::unordered_set<std::string> missingAdtWarnings_;
std::mutex missingAdtWarningsMutex_; std::mutex missingAdtWarningsMutex_;
// Thread-safe set of M2 model IDs already uploaded to GPU
// (checked by workers to skip redundant file I/O + parsing)
std::unordered_set<uint32_t> uploadedM2Ids_;
std::mutex uploadedM2IdsMutex_;
// Dedup set for doodad placements across tile boundaries // Dedup set for doodad placements across tile boundaries
std::unordered_set<uint32_t> placedDoodadIds; std::unordered_set<uint32_t> placedDoodadIds;

View file

@ -86,6 +86,13 @@ public:
const std::vector<std::string>& texturePaths, const std::vector<std::string>& texturePaths,
int tileX = -1, int tileY = -1); int tileX = -1, int tileY = -1);
/// Upload a batch of terrain chunks incrementally. Returns true when all chunks done.
/// chunkIndex is updated to the next chunk to process (0-255 row-major).
bool loadTerrainIncremental(const pipeline::TerrainMesh& mesh,
const std::vector<std::string>& texturePaths,
int tileX, int tileY,
int& chunkIndex, int maxChunksPerCall = 16);
void removeTile(int tileX, int tileY); void removeTile(int tileX, int tileY);
void uploadPreloadedTextures(const std::unordered_map<std::string, pipeline::BLPImage>& textures); void uploadPreloadedTextures(const std::unordered_map<std::string, pipeline::BLPImage>& textures);
@ -120,6 +127,7 @@ public:
int getRenderedChunkCount() const { return renderedChunks; } int getRenderedChunkCount() const { return renderedChunks; }
int getCulledChunkCount() const { return culledChunks; } int getCulledChunkCount() const { return culledChunks; }
int getTriangleCount() const; int getTriangleCount() const;
VkContext* getVkContext() const { return vkCtx; }
private: private:
TerrainChunkGPU uploadChunk(const pipeline::ChunkMesh& chunk); TerrainChunkGPU uploadChunk(const pipeline::ChunkMesh& chunk);

View file

@ -1,5 +1,6 @@
#pragma once #pragma once
#include "rendering/vk_utils.hpp"
#include <vulkan/vulkan.h> #include <vulkan/vulkan.h>
#include <vk_mem_alloc.h> #include <vk_mem_alloc.h>
#include <VkBootstrap.h> #include <VkBootstrap.h>
@ -46,6 +47,16 @@ public:
// Immediate submit for one-off GPU work (descriptor pool creation, etc.) // Immediate submit for one-off GPU work (descriptor pool creation, etc.)
void immediateSubmit(std::function<void(VkCommandBuffer cmd)>&& function); void immediateSubmit(std::function<void(VkCommandBuffer cmd)>&& function);
// Batch upload mode: records multiple upload commands into a single
// command buffer, then submits with ONE fence wait instead of one per upload.
void beginUploadBatch();
void endUploadBatch(); // Async: submits but does NOT wait for fence
void endUploadBatchSync(); // Sync: submits and waits (for load screens)
bool isInUploadBatch() const { return inUploadBatch_; }
void deferStagingCleanup(AllocatedBuffer staging);
void pollUploadBatches(); // Check completed async uploads, free staging buffers
void waitAllUploads(); // Block until all in-flight uploads complete
// Accessors // Accessors
VkInstance getInstance() const { return instance; } VkInstance getInstance() const { return instance; }
VkPhysicalDevice getPhysicalDevice() const { return physicalDevice; } VkPhysicalDevice getPhysicalDevice() const { return physicalDevice; }
@ -143,6 +154,20 @@ private:
VkCommandPool immCommandPool = VK_NULL_HANDLE; VkCommandPool immCommandPool = VK_NULL_HANDLE;
VkFence immFence = VK_NULL_HANDLE; VkFence immFence = VK_NULL_HANDLE;
// Batch upload state (nesting-safe via depth counter)
int uploadBatchDepth_ = 0;
bool inUploadBatch_ = false;
VkCommandBuffer batchCmd_ = VK_NULL_HANDLE;
std::vector<AllocatedBuffer> batchStagingBuffers_;
// Async upload: in-flight batches awaiting GPU completion
struct InFlightBatch {
VkFence fence = VK_NULL_HANDLE;
VkCommandBuffer cmd = VK_NULL_HANDLE;
std::vector<AllocatedBuffer> stagingBuffers;
};
std::vector<InFlightBatch> inFlightBatches_;
// Depth buffer (shared across all framebuffers) // Depth buffer (shared across all framebuffers)
VkImage depthImage = VK_NULL_HANDLE; VkImage depthImage = VK_NULL_HANDLE;
VkImageView depthImageView = VK_NULL_HANDLE; VkImageView depthImageView = VK_NULL_HANDLE;

View file

@ -1,5 +1,6 @@
#pragma once #pragma once
#include "pipeline/blp_loader.hpp"
#include <vulkan/vulkan.h> #include <vulkan/vulkan.h>
#include <vk_mem_alloc.h> #include <vk_mem_alloc.h>
#include <glm/glm.hpp> #include <glm/glm.hpp>
@ -325,6 +326,12 @@ public:
// Pre-compute floor cache for all loaded WMO instances // Pre-compute floor cache for all loaded WMO instances
void precomputeFloorCache(); void precomputeFloorCache();
// Pre-decoded BLP cache: set before calling loadModel() to skip main-thread BLP decode
void setPredecodedBLPCache(std::unordered_map<std::string, pipeline::BLPImage>* cache) { predecodedBLPCache_ = cache; }
// Defer normal/height map generation during streaming to avoid CPU stalls
void setDeferNormalMaps(bool defer) { deferNormalMaps_ = defer; }
private: private:
// WMO material UBO — matches WMOMaterial in wmo.frag.glsl // WMO material UBO — matches WMOMaterial in wmo.frag.glsl
struct WMOMaterialUBO { struct WMOMaterialUBO {
@ -558,6 +565,7 @@ private:
* Load a texture from path * Load a texture from path
*/ */
VkTexture* loadTexture(const std::string& path); VkTexture* loadTexture(const std::string& path);
std::unordered_map<std::string, pipeline::BLPImage>* predecodedBLPCache_ = nullptr;
/** /**
* Generate normal+height map from diffuse RGBA8 pixels * Generate normal+height map from diffuse RGBA8 pixels
@ -670,6 +678,7 @@ private:
// Normal mapping / POM settings // Normal mapping / POM settings
bool normalMappingEnabled_ = true; // on by default bool normalMappingEnabled_ = true; // on by default
bool deferNormalMaps_ = false; // skip normal map gen during streaming
float normalMapStrength_ = 0.8f; // 0.0 = flat, 1.0 = full, 2.0 = exaggerated float normalMapStrength_ = 0.8f; // 0.0 = flat, 1.0 = full, 2.0 = exaggerated
bool pomEnabled_ = true; // on by default bool pomEnabled_ = true; // on by default
int pomQuality_ = 1; // 0=Low(16), 1=Medium(32), 2=High(64) int pomQuality_ = 1; // 0=Low(16), 1=Medium(32), 2=High(64)

File diff suppressed because it is too large Load diff

View file

@ -541,7 +541,13 @@ void GameHandler::update(float deltaTime) {
// Update socket (processes incoming data and triggers callbacks) // Update socket (processes incoming data and triggers callbacks)
if (socket) { if (socket) {
auto socketStart = std::chrono::steady_clock::now();
socket->update(); socket->update();
float socketMs = std::chrono::duration<float, std::milli>(
std::chrono::steady_clock::now() - socketStart).count();
if (socketMs > 3.0f) {
LOG_WARNING("SLOW socket->update: ", socketMs, "ms");
}
} }
// Detect server-side disconnect (socket closed during update) // Detect server-side disconnect (socket closed during update)

View file

@ -197,6 +197,29 @@ bool CharacterRenderer::initialize(VkContext* ctx, VkDescriptorSetLayout perFram
vkCreateDescriptorPool(device, &ci, nullptr, &boneDescPool_); vkCreateDescriptorPool(device, &ci, nullptr, &boneDescPool_);
} }
// --- Material UBO ring buffers (one per frame slot) ---
{
VkPhysicalDeviceProperties props;
vkGetPhysicalDeviceProperties(ctx->getPhysicalDevice(), &props);
materialUboAlignment_ = static_cast<uint32_t>(props.limits.minUniformBufferOffsetAlignment);
if (materialUboAlignment_ < 1) materialUboAlignment_ = 1;
// Round up UBO size to alignment
uint32_t alignedUboSize = (sizeof(CharMaterialUBO) + materialUboAlignment_ - 1) & ~(materialUboAlignment_ - 1);
uint32_t ringSize = alignedUboSize * MATERIAL_RING_CAPACITY;
for (int i = 0; i < 2; i++) {
VkBufferCreateInfo bci{VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO};
bci.size = ringSize;
bci.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT;
VmaAllocationCreateInfo aci{};
aci.usage = VMA_MEMORY_USAGE_CPU_TO_GPU;
aci.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT;
VmaAllocationInfo allocInfo{};
vmaCreateBuffer(ctx->getAllocator(), &bci, &aci,
&materialRingBuffer_[i], &materialRingAlloc_[i], &allocInfo);
materialRingMapped_[i] = allocInfo.pMappedData;
}
}
// --- Pipeline layout --- // --- Pipeline layout ---
// set 0 = perFrame, set 1 = material, set 2 = bones // set 0 = perFrame, set 1 = material, set 2 = bones
// Push constant: mat4 model = 64 bytes // Push constant: mat4 model = 64 bytes
@ -352,14 +375,15 @@ void CharacterRenderer::shutdown() {
if (pipelineLayout_) { vkDestroyPipelineLayout(device, pipelineLayout_, nullptr); pipelineLayout_ = VK_NULL_HANDLE; } if (pipelineLayout_) { vkDestroyPipelineLayout(device, pipelineLayout_, nullptr); pipelineLayout_ = VK_NULL_HANDLE; }
// Release any deferred transient material UBOs. // Destroy material ring buffers
for (int i = 0; i < 2; i++) { for (int i = 0; i < 2; i++) {
for (const auto& b : transientMaterialUbos_[i]) { if (materialRingBuffer_[i]) {
if (b.first) { vmaDestroyBuffer(alloc, materialRingBuffer_[i], materialRingAlloc_[i]);
vmaDestroyBuffer(alloc, b.first, b.second); materialRingBuffer_[i] = VK_NULL_HANDLE;
materialRingAlloc_[i] = VK_NULL_HANDLE;
materialRingMapped_[i] = nullptr;
} }
} materialRingOffset_[i] = 0;
transientMaterialUbos_[i].clear();
} }
// Destroy descriptor pools and layouts // Destroy descriptor pools and layouts
@ -391,7 +415,6 @@ void CharacterRenderer::clear() {
vkDeviceWaitIdle(vkCtx_->getDevice()); vkDeviceWaitIdle(vkCtx_->getDevice());
VkDevice device = vkCtx_->getDevice(); VkDevice device = vkCtx_->getDevice();
VmaAllocator alloc = vkCtx_->getAllocator();
// Destroy GPU resources for all models // Destroy GPU resources for all models
for (auto& pair : models) { for (auto& pair : models) {
@ -441,14 +464,9 @@ void CharacterRenderer::clear() {
models.clear(); models.clear();
instances.clear(); instances.clear();
// Release deferred transient material UBOs // Reset material ring buffer offsets (buffers persist, just reset write position)
for (int i = 0; i < 2; i++) { for (int i = 0; i < 2; i++) {
for (const auto& b : transientMaterialUbos_[i]) { materialRingOffset_[i] = 0;
if (b.first) {
vmaDestroyBuffer(alloc, b.first, b.second);
}
}
transientMaterialUbos_[i].clear();
} }
// Reset descriptor pools (don't destroy — reuse for new allocations) // Reset descriptor pools (don't destroy — reuse for new allocations)
@ -607,7 +625,18 @@ VkTexture* CharacterRenderer::loadTexture(const std::string& path) {
return whiteTexture_.get(); return whiteTexture_.get();
} }
auto blpImage = assetManager->loadTexture(key); // Check pre-decoded BLP cache first (populated by background threads)
pipeline::BLPImage blpImage;
if (predecodedBLPCache_) {
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
blpImage = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!blpImage.isValid()) {
blpImage = assetManager->loadTexture(key);
}
if (!blpImage.isValid()) { if (!blpImage.isValid()) {
// Return white fallback but don't cache the failure — allow retry // Return white fallback but don't cache the failure — allow retry
// on next character load in case the asset becomes available. // on next character load in case the asset becomes available.
@ -658,13 +687,16 @@ VkTexture* CharacterRenderer::loadTexture(const std::string& path) {
e.hasAlpha = hasAlpha; e.hasAlpha = hasAlpha;
e.colorKeyBlack = colorKeyBlackHint; e.colorKeyBlack = colorKeyBlackHint;
// Generate normal/height map from diffuse texture // Defer normal/height map generation to avoid stalling loadModel.
float nhVariance = 0.0f; // Normal maps are generated in processPendingNormalMaps() at a per-frame budget.
auto nhMap = generateNormalHeightMap(blpImage.data.data(), blpImage.width, blpImage.height, nhVariance); if (blpImage.width >= 32 && blpImage.height >= 32) {
if (nhMap) { PendingNormalMap pending;
e.heightMapVariance = nhVariance; pending.cacheKey = key;
e.approxBytes += approxTextureBytesWithMips(blpImage.width, blpImage.height); pending.pixels.assign(blpImage.data.begin(), blpImage.data.end());
e.normalHeightMap = std::move(nhMap); pending.width = blpImage.width;
pending.height = blpImage.height;
pendingNormalMaps_.push_back(std::move(pending));
e.normalMapPending = true;
} }
textureCacheBytes_ += e.approxBytes; textureCacheBytes_ += e.approxBytes;
@ -676,6 +708,34 @@ VkTexture* CharacterRenderer::loadTexture(const std::string& path) {
return texPtr; return texPtr;
} }
void CharacterRenderer::processPendingNormalMaps(int budget) {
if (pendingNormalMaps_.empty() || !vkCtx_) return;
int processed = 0;
while (!pendingNormalMaps_.empty() && processed < budget) {
auto pending = std::move(pendingNormalMaps_.front());
pendingNormalMaps_.pop_front();
auto it = textureCache.find(pending.cacheKey);
if (it == textureCache.end()) continue; // texture was evicted
float nhVariance = 0.0f;
vkCtx_->beginUploadBatch();
auto nhMap = generateNormalHeightMap(pending.pixels.data(),
pending.width, pending.height, nhVariance);
vkCtx_->endUploadBatch();
if (nhMap) {
it->second.heightMapVariance = nhVariance;
it->second.approxBytes += approxTextureBytesWithMips(pending.width, pending.height);
textureCacheBytes_ += approxTextureBytesWithMips(pending.width, pending.height);
it->second.normalHeightMap = std::move(nhMap);
}
it->second.normalMapPending = false;
processed++;
}
}
// Alpha-blend overlay onto composite at (dstX, dstY) // Alpha-blend overlay onto composite at (dstX, dstY)
static void blitOverlay(std::vector<uint8_t>& composite, int compW, int compH, static void blitOverlay(std::vector<uint8_t>& composite, int compW, int compH,
const pipeline::BLPImage& overlay, int dstX, int dstY) { const pipeline::BLPImage& overlay, int dstX, int dstY) {
@ -807,7 +867,19 @@ VkTexture* CharacterRenderer::compositeTextures(const std::vector<std::string>&
} }
// Load base layer // Load base layer
auto base = assetManager->loadTexture(layerPaths[0]); pipeline::BLPImage base;
if (predecodedBLPCache_) {
std::string key = layerPaths[0];
std::replace(key.begin(), key.end(), '/', '\\');
std::transform(key.begin(), key.end(), key.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
base = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!base.isValid()) base = assetManager->loadTexture(layerPaths[0]);
if (!base.isValid()) { if (!base.isValid()) {
core::Logger::getInstance().warning("Composite: failed to load base layer: ", layerPaths[0]); core::Logger::getInstance().warning("Composite: failed to load base layer: ", layerPaths[0]);
return whiteTexture_.get(); return whiteTexture_.get();
@ -848,7 +920,19 @@ VkTexture* CharacterRenderer::compositeTextures(const std::vector<std::string>&
for (size_t layer = 1; layer < layerPaths.size(); layer++) { for (size_t layer = 1; layer < layerPaths.size(); layer++) {
if (layerPaths[layer].empty()) continue; if (layerPaths[layer].empty()) continue;
auto overlay = assetManager->loadTexture(layerPaths[layer]); pipeline::BLPImage overlay;
if (predecodedBLPCache_) {
std::string key = layerPaths[layer];
std::replace(key.begin(), key.end(), '/', '\\');
std::transform(key.begin(), key.end(), key.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
overlay = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!overlay.isValid()) overlay = assetManager->loadTexture(layerPaths[layer]);
if (!overlay.isValid()) { if (!overlay.isValid()) {
core::Logger::getInstance().warning("Composite: FAILED to load overlay: ", layerPaths[layer]); core::Logger::getInstance().warning("Composite: FAILED to load overlay: ", layerPaths[layer]);
continue; continue;
@ -1025,7 +1109,19 @@ VkTexture* CharacterRenderer::compositeWithRegions(const std::string& basePath,
return whiteTexture_.get(); return whiteTexture_.get();
} }
auto base = assetManager->loadTexture(basePath); pipeline::BLPImage base;
if (predecodedBLPCache_) {
std::string key = basePath;
std::replace(key.begin(), key.end(), '/', '\\');
std::transform(key.begin(), key.end(), key.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
base = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!base.isValid()) base = assetManager->loadTexture(basePath);
if (!base.isValid()) { if (!base.isValid()) {
return whiteTexture_.get(); return whiteTexture_.get();
} }
@ -1064,7 +1160,19 @@ VkTexture* CharacterRenderer::compositeWithRegions(const std::string& basePath,
bool upscaled = (base.width == 256 && base.height == 256 && width == 512); bool upscaled = (base.width == 256 && base.height == 256 && width == 512);
for (const auto& ul : baseLayers) { for (const auto& ul : baseLayers) {
if (ul.empty()) continue; if (ul.empty()) continue;
auto overlay = assetManager->loadTexture(ul); pipeline::BLPImage overlay;
if (predecodedBLPCache_) {
std::string key = ul;
std::replace(key.begin(), key.end(), '/', '\\');
std::transform(key.begin(), key.end(), key.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
overlay = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!overlay.isValid()) overlay = assetManager->loadTexture(ul);
if (!overlay.isValid()) continue; if (!overlay.isValid()) continue;
if (overlay.width == width && overlay.height == height) { if (overlay.width == width && overlay.height == height) {
@ -1142,7 +1250,19 @@ VkTexture* CharacterRenderer::compositeWithRegions(const std::string& basePath,
int regionIdx = rl.first; int regionIdx = rl.first;
if (regionIdx < 0 || regionIdx >= 8) continue; if (regionIdx < 0 || regionIdx >= 8) continue;
auto overlay = assetManager->loadTexture(rl.second); pipeline::BLPImage overlay;
if (predecodedBLPCache_) {
std::string key = rl.second;
std::replace(key.begin(), key.end(), '/', '\\');
std::transform(key.begin(), key.end(), key.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
overlay = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!overlay.isValid()) overlay = assetManager->loadTexture(rl.second);
if (!overlay.isValid()) { if (!overlay.isValid()) {
core::Logger::getInstance().warning("compositeWithRegions: failed to load ", rl.second); core::Logger::getInstance().warning("compositeWithRegions: failed to load ", rl.second);
continue; continue;
@ -1247,6 +1367,10 @@ bool CharacterRenderer::loadModel(const pipeline::M2Model& model, uint32_t id) {
M2ModelGPU gpuModel; M2ModelGPU gpuModel;
gpuModel.data = model; gpuModel.data = model;
// Batch all GPU uploads (VB, IB, textures) into a single command buffer
// submission with one fence wait, instead of one fence wait per upload.
vkCtx_->beginUploadBatch();
// Setup GPU buffers // Setup GPU buffers
setupModelBuffers(gpuModel); setupModelBuffers(gpuModel);
@ -1259,6 +1383,8 @@ bool CharacterRenderer::loadModel(const pipeline::M2Model& model, uint32_t id) {
gpuModel.textureIds.push_back(texPtr); gpuModel.textureIds.push_back(texPtr);
} }
vkCtx_->endUploadBatch();
models[id] = std::move(gpuModel); models[id] = std::move(gpuModel);
core::Logger::getInstance().debug("Loaded M2 model ", id, " (", model.vertices.size(), core::Logger::getInstance().debug("Loaded M2 model ", id, " (", model.vertices.size(),
@ -1388,8 +1514,9 @@ uint32_t CharacterRenderer::createInstance(uint32_t modelId, const glm::vec3& po
instance.scale = scale; instance.scale = scale;
// Initialize bone matrices to identity // Initialize bone matrices to identity
auto& model = models[modelId].data; auto& gpuRef = models[modelId];
instance.boneMatrices.resize(std::max(static_cast<size_t>(1), model.bones.size()), glm::mat4(1.0f)); instance.boneMatrices.resize(std::max(static_cast<size_t>(1), gpuRef.data.bones.size()), glm::mat4(1.0f));
instance.cachedModel = &gpuRef;
uint32_t id = instance.id; uint32_t id = instance.id;
instances[id] = std::move(instance); instances[id] = std::move(instance);
@ -1448,8 +1575,14 @@ void CharacterRenderer::update(float deltaTime, const glm::vec3& cameraPos) {
const float animUpdateRadius = static_cast<float>(envSizeOrDefault("WOWEE_CHAR_ANIM_RADIUS", 120)); const float animUpdateRadius = static_cast<float>(envSizeOrDefault("WOWEE_CHAR_ANIM_RADIUS", 120));
const float animUpdateRadiusSq = animUpdateRadius * animUpdateRadius; const float animUpdateRadiusSq = animUpdateRadius * animUpdateRadius;
// Single pass: fade-in, movement, and animation bone collection
std::vector<std::reference_wrapper<CharacterInstance>> toUpdate;
toUpdate.reserve(instances.size());
for (auto& pair : instances) {
auto& inst = pair.second;
// Update fade-in opacity // Update fade-in opacity
for (auto& [id, inst] : instances) {
if (inst.fadeInDuration > 0.0f && inst.opacity < 1.0f) { if (inst.fadeInDuration > 0.0f && inst.opacity < 1.0f) {
inst.fadeInTime += deltaTime; inst.fadeInTime += deltaTime;
inst.opacity = std::min(1.0f, inst.fadeInTime / inst.fadeInDuration); inst.opacity = std::min(1.0f, inst.fadeInTime / inst.fadeInDuration);
@ -1457,10 +1590,8 @@ void CharacterRenderer::update(float deltaTime, const glm::vec3& cameraPos) {
inst.fadeInDuration = 0.0f; inst.fadeInDuration = 0.0f;
} }
} }
}
// Interpolate creature movement // Interpolate creature movement
for (auto& [id, inst] : instances) {
if (inst.isMoving) { if (inst.isMoving) {
inst.moveElapsed += deltaTime; inst.moveElapsed += deltaTime;
float t = inst.moveElapsed / inst.moveDuration; float t = inst.moveElapsed / inst.moveDuration;
@ -1469,36 +1600,26 @@ void CharacterRenderer::update(float deltaTime, const glm::vec3& cameraPos) {
inst.isMoving = false; inst.isMoving = false;
// Return to idle when movement completes // Return to idle when movement completes
if (inst.currentAnimationId == 4 || inst.currentAnimationId == 5) { if (inst.currentAnimationId == 4 || inst.currentAnimationId == 5) {
playAnimation(id, 0, true); playAnimation(pair.first, 0, true);
} }
} else { } else {
inst.position = glm::mix(inst.moveStart, inst.moveEnd, t); inst.position = glm::mix(inst.moveStart, inst.moveEnd, t);
} }
} }
}
// Only update animations for nearby characters (performance optimization) // Skip weapon instances for animation — their transforms are set by parent bones
// Collect instances that need bone recomputation, with distance-based throttling
std::vector<std::reference_wrapper<CharacterInstance>> toUpdate;
toUpdate.reserve(instances.size());
for (auto& pair : instances) {
auto& inst = pair.second;
// Skip weapon instances — their transforms are set by parent bones
if (inst.hasOverrideModelMatrix) continue; if (inst.hasOverrideModelMatrix) continue;
float distSq = glm::distance2(inst.position, cameraPos); float distSq = glm::distance2(inst.position, cameraPos);
if (distSq >= animUpdateRadiusSq) continue; if (distSq >= animUpdateRadiusSq) continue;
// Always advance animation time (cheap) // Always advance animation time (cheap)
auto modelIt = models.find(inst.modelId); if (inst.cachedModel && !inst.cachedModel->data.sequences.empty()) {
if (modelIt != models.end() && !modelIt->second.data.sequences.empty()) {
if (inst.currentSequenceIndex < 0) { if (inst.currentSequenceIndex < 0) {
inst.currentSequenceIndex = 0; inst.currentSequenceIndex = 0;
inst.currentAnimationId = modelIt->second.data.sequences[0].id; inst.currentAnimationId = inst.cachedModel->data.sequences[0].id;
} }
const auto& seq = modelIt->second.data.sequences[inst.currentSequenceIndex]; const auto& seq = inst.cachedModel->data.sequences[inst.currentSequenceIndex];
inst.animationTime += deltaTime * 1000.0f; inst.animationTime += deltaTime * 1000.0f;
if (seq.duration > 0 && inst.animationTime >= static_cast<float>(seq.duration)) { if (seq.duration > 0 && inst.animationTime >= static_cast<float>(seq.duration)) {
if (inst.animationLoop) { if (inst.animationLoop) {
@ -1509,10 +1630,11 @@ void CharacterRenderer::update(float deltaTime, const glm::vec3& cameraPos) {
} }
} }
// Distance-tiered bone throttling: near=every frame, mid=every 3rd, far=every 6th // Distance-tiered bone throttling: near=every frame, mid=every 4th, far=every 8th
uint32_t boneInterval = 1; uint32_t boneInterval = 1;
if (distSq > 60.0f * 60.0f) boneInterval = 6; if (distSq > 40.0f * 40.0f) boneInterval = 8;
else if (distSq > 30.0f * 30.0f) boneInterval = 3; else if (distSq > 20.0f * 20.0f) boneInterval = 4;
else if (distSq > 10.0f * 10.0f) boneInterval = 2;
inst.boneUpdateCounter++; inst.boneUpdateCounter++;
bool needsBones = (inst.boneUpdateCounter >= boneInterval) || inst.boneMatrices.empty(); bool needsBones = (inst.boneUpdateCounter >= boneInterval) || inst.boneMatrices.empty();
@ -1527,7 +1649,7 @@ void CharacterRenderer::update(float deltaTime, const glm::vec3& cameraPos) {
// Thread bone matrix computation in chunks // Thread bone matrix computation in chunks
if (updatedCount >= 8 && numAnimThreads_ > 1) { if (updatedCount >= 8 && numAnimThreads_ > 1) {
static const size_t minAnimWorkPerThread = std::max<size_t>( static const size_t minAnimWorkPerThread = std::max<size_t>(
16, envSizeOrDefault("WOWEE_CHAR_ANIM_WORK_PER_THREAD", 64)); 8, envSizeOrDefault("WOWEE_CHAR_ANIM_WORK_PER_THREAD", 16));
const size_t maxUsefulThreads = std::max<size_t>( const size_t maxUsefulThreads = std::max<size_t>(
1, (updatedCount + minAnimWorkPerThread - 1) / minAnimWorkPerThread); 1, (updatedCount + minAnimWorkPerThread - 1) / minAnimWorkPerThread);
const size_t numThreads = std::min(static_cast<size_t>(numAnimThreads_), maxUsefulThreads); const size_t numThreads = std::min(static_cast<size_t>(numAnimThreads_), maxUsefulThreads);
@ -1596,11 +1718,8 @@ void CharacterRenderer::update(float deltaTime, const glm::vec3& cameraPos) {
} }
void CharacterRenderer::updateAnimation(CharacterInstance& instance, float deltaTime) { void CharacterRenderer::updateAnimation(CharacterInstance& instance, float deltaTime) {
auto modelIt = models.find(instance.modelId); if (!instance.cachedModel) return;
if (modelIt == models.end()) { const auto& model = instance.cachedModel->data;
return;
}
const auto& model = modelIt->second.data;
if (model.sequences.empty()) { if (model.sequences.empty()) {
return; return;
@ -1713,7 +1832,8 @@ glm::quat CharacterRenderer::interpolateQuat(const pipeline::M2AnimationTrack& t
// --- Bone transform calculation --- // --- Bone transform calculation ---
void CharacterRenderer::calculateBoneMatrices(CharacterInstance& instance) { void CharacterRenderer::calculateBoneMatrices(CharacterInstance& instance) {
auto& model = models[instance.modelId].data; if (!instance.cachedModel) return;
auto& model = instance.cachedModel->data;
if (model.bones.empty()) { if (model.bones.empty()) {
return; return;
@ -1722,8 +1842,6 @@ void CharacterRenderer::calculateBoneMatrices(CharacterInstance& instance) {
size_t numBones = model.bones.size(); size_t numBones = model.bones.size();
instance.boneMatrices.resize(numBones); instance.boneMatrices.resize(numBones);
static bool dumpedOnce = false;
for (size_t i = 0; i < numBones; i++) { for (size_t i = 0; i < numBones; i++) {
const auto& bone = model.bones[i]; const auto& bone = model.bones[i];
@ -1731,19 +1849,6 @@ void CharacterRenderer::calculateBoneMatrices(CharacterInstance& instance) {
// At rest this is identity, so no separate bind pose is needed // At rest this is identity, so no separate bind pose is needed
glm::mat4 localTransform = getBoneTransform(bone, instance.animationTime, instance.currentSequenceIndex); glm::mat4 localTransform = getBoneTransform(bone, instance.animationTime, instance.currentSequenceIndex);
// Debug: dump first frame bone data
if (!dumpedOnce && i < 5) {
glm::vec3 t = interpolateVec3(bone.translation, instance.currentSequenceIndex, instance.animationTime, glm::vec3(0.0f));
glm::quat r = interpolateQuat(bone.rotation, instance.currentSequenceIndex, instance.animationTime);
glm::vec3 s = interpolateVec3(bone.scale, instance.currentSequenceIndex, instance.animationTime, glm::vec3(1.0f));
core::Logger::getInstance().info("Bone ", i, " parent=", bone.parentBone,
" pivot=(", bone.pivot.x, ",", bone.pivot.y, ",", bone.pivot.z, ")",
" t=(", t.x, ",", t.y, ",", t.z, ")",
" r=(", r.w, ",", r.x, ",", r.y, ",", r.z, ")",
" s=(", s.x, ",", s.y, ",", s.z, ")",
" seqIdx=", instance.currentSequenceIndex);
}
// Compose with parent // Compose with parent
if (bone.parentBone >= 0 && static_cast<size_t>(bone.parentBone) < numBones) { if (bone.parentBone >= 0 && static_cast<size_t>(bone.parentBone) < numBones) {
instance.boneMatrices[i] = instance.boneMatrices[bone.parentBone] * localTransform; instance.boneMatrices[i] = instance.boneMatrices[bone.parentBone] * localTransform;
@ -1751,12 +1856,6 @@ void CharacterRenderer::calculateBoneMatrices(CharacterInstance& instance) {
instance.boneMatrices[i] = localTransform; instance.boneMatrices[i] = localTransform;
} }
} }
if (!dumpedOnce) {
dumpedOnce = true;
// Dump final matrix for bone 0
auto& m = instance.boneMatrices[0];
core::Logger::getInstance().info("Bone 0 final matrix row0=(", m[0][0], ",", m[1][0], ",", m[2][0], ",", m[3][0], ")");
}
} }
glm::mat4 CharacterRenderer::getBoneTransform(const pipeline::M2Bone& bone, float time, int sequenceIndex) { glm::mat4 CharacterRenderer::getBoneTransform(const pipeline::M2Bone& bone, float time, int sequenceIndex) {
@ -1791,22 +1890,19 @@ void CharacterRenderer::render(VkCommandBuffer cmd, VkDescriptorSet perFrameSet,
uint32_t frameIndex = vkCtx_->getCurrentFrame(); uint32_t frameIndex = vkCtx_->getCurrentFrame();
uint32_t frameSlot = frameIndex % 2u; uint32_t frameSlot = frameIndex % 2u;
// Reset transient material allocations once per frame slot. // Reset material ring buffer and descriptor pool once per frame slot.
// beginFrame() waits on this slot's fence before recording.
if (lastMaterialPoolResetFrame_ != frameIndex) { if (lastMaterialPoolResetFrame_ != frameIndex) {
VmaAllocator alloc = vkCtx_->getAllocator(); materialRingOffset_[frameSlot] = 0;
for (const auto& b : transientMaterialUbos_[frameSlot]) {
if (b.first) {
vmaDestroyBuffer(alloc, b.first, b.second);
}
}
transientMaterialUbos_[frameSlot].clear();
if (materialDescPools_[frameSlot]) { if (materialDescPools_[frameSlot]) {
vkResetDescriptorPool(vkCtx_->getDevice(), materialDescPools_[frameSlot], 0); vkResetDescriptorPool(vkCtx_->getDevice(), materialDescPools_[frameSlot], 0);
} }
lastMaterialPoolResetFrame_ = frameIndex; lastMaterialPoolResetFrame_ = frameIndex;
} }
// Pre-compute aligned UBO stride for ring buffer sub-allocation
const uint32_t uboStride = (sizeof(CharMaterialUBO) + materialUboAlignment_ - 1) & ~(materialUboAlignment_ - 1);
const uint32_t ringCapacityBytes = uboStride * MATERIAL_RING_CAPACITY;
// Bind per-frame descriptor set (set 0) -- shared across all draws // Bind per-frame descriptor set (set 0) -- shared across all draws
vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipelineLayout_, 0, 1, &perFrameSet, 0, nullptr); pipelineLayout_, 0, 1, &perFrameSet, 0, nullptr);
@ -1838,9 +1934,8 @@ void CharacterRenderer::render(VkCommandBuffer cmd, VkDescriptorSet perFrameSet,
} }
} }
auto modelIt = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (modelIt == models.end()) continue; const auto& gpuModel = *instance.cachedModel;
const auto& gpuModel = modelIt->second;
// Skip models without GPU buffers // Skip models without GPU buffers
if (!gpuModel.vertexBuffer) continue; if (!gpuModel.vertexBuffer) continue;
@ -2176,27 +2271,18 @@ void CharacterRenderer::render(VkCommandBuffer cmd, VkDescriptorSet perFrameSet,
matData.heightMapVariance = batchHeightVariance; matData.heightMapVariance = batchHeightVariance;
matData.normalMapStrength = normalMapStrength_; matData.normalMapStrength = normalMapStrength_;
// Create a small UBO for this batch's material // Sub-allocate material UBO from ring buffer
VkBufferCreateInfo bci{VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO}; uint32_t matOffset = materialRingOffset_[frameSlot];
bci.size = sizeof(CharMaterialUBO); if (matOffset + uboStride > ringCapacityBytes) continue; // ring exhausted
bci.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT; memcpy(static_cast<char*>(materialRingMapped_[frameSlot]) + matOffset, &matData, sizeof(CharMaterialUBO));
VmaAllocationCreateInfo aci{}; materialRingOffset_[frameSlot] = matOffset + uboStride;
aci.usage = VMA_MEMORY_USAGE_CPU_TO_GPU;
aci.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT;
VmaAllocationInfo allocInfo{};
::VkBuffer matUBO = VK_NULL_HANDLE;
VmaAllocation matUBOAlloc = VK_NULL_HANDLE;
vmaCreateBuffer(vkCtx_->getAllocator(), &bci, &aci, &matUBO, &matUBOAlloc, &allocInfo);
if (allocInfo.pMappedData) {
memcpy(allocInfo.pMappedData, &matData, sizeof(CharMaterialUBO));
}
// Write descriptor set: binding 0 = texture, binding 1 = material UBO, binding 2 = normal/height map // Write descriptor set: binding 0 = texture, binding 1 = material UBO, binding 2 = normal/height map
VkTexture* bindTex = (texPtr && texPtr->isValid()) ? texPtr : whiteTexture_.get(); VkTexture* bindTex = (texPtr && texPtr->isValid()) ? texPtr : whiteTexture_.get();
VkDescriptorImageInfo imgInfo = bindTex->descriptorInfo(); VkDescriptorImageInfo imgInfo = bindTex->descriptorInfo();
VkDescriptorBufferInfo bufInfo{}; VkDescriptorBufferInfo bufInfo{};
bufInfo.buffer = matUBO; bufInfo.buffer = materialRingBuffer_[frameSlot];
bufInfo.offset = 0; bufInfo.offset = matOffset;
bufInfo.range = sizeof(CharMaterialUBO); bufInfo.range = sizeof(CharMaterialUBO);
VkDescriptorImageInfo nhImgInfo = normalMap->descriptorInfo(); VkDescriptorImageInfo nhImgInfo = normalMap->descriptorInfo();
@ -2229,8 +2315,6 @@ void CharacterRenderer::render(VkCommandBuffer cmd, VkDescriptorSet perFrameSet,
pipelineLayout_, 1, 1, &materialSet, 0, nullptr); pipelineLayout_, 1, 1, &materialSet, 0, nullptr);
vkCmdDrawIndexed(cmd, batch.indexCount, 1, batch.indexStart, 0, 0); vkCmdDrawIndexed(cmd, batch.indexCount, 1, batch.indexStart, 0, 0);
transientMaterialUbos_[frameSlot].emplace_back(matUBO, matUBOAlloc);
} }
} else { } else {
// Draw entire model with first texture // Draw entire model with first texture
@ -2271,24 +2355,16 @@ void CharacterRenderer::render(VkCommandBuffer cmd, VkDescriptorSet perFrameSet,
matData.heightMapVariance = 0.0f; matData.heightMapVariance = 0.0f;
matData.normalMapStrength = normalMapStrength_; matData.normalMapStrength = normalMapStrength_;
VkBufferCreateInfo bci{VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO}; // Sub-allocate material UBO from ring buffer
bci.size = sizeof(CharMaterialUBO); uint32_t matOffset2 = materialRingOffset_[frameSlot];
bci.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT; if (matOffset2 + uboStride > ringCapacityBytes) continue; // ring exhausted
VmaAllocationCreateInfo aci{}; memcpy(static_cast<char*>(materialRingMapped_[frameSlot]) + matOffset2, &matData, sizeof(CharMaterialUBO));
aci.usage = VMA_MEMORY_USAGE_CPU_TO_GPU; materialRingOffset_[frameSlot] = matOffset2 + uboStride;
aci.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT;
VmaAllocationInfo allocInfo{};
::VkBuffer matUBO = VK_NULL_HANDLE;
VmaAllocation matUBOAlloc = VK_NULL_HANDLE;
vmaCreateBuffer(vkCtx_->getAllocator(), &bci, &aci, &matUBO, &matUBOAlloc, &allocInfo);
if (allocInfo.pMappedData) {
memcpy(allocInfo.pMappedData, &matData, sizeof(CharMaterialUBO));
}
VkDescriptorImageInfo imgInfo = texPtr->descriptorInfo(); VkDescriptorImageInfo imgInfo = texPtr->descriptorInfo();
VkDescriptorBufferInfo bufInfo{}; VkDescriptorBufferInfo bufInfo{};
bufInfo.buffer = matUBO; bufInfo.buffer = materialRingBuffer_[frameSlot];
bufInfo.offset = 0; bufInfo.offset = matOffset2;
bufInfo.range = sizeof(CharMaterialUBO); bufInfo.range = sizeof(CharMaterialUBO);
VkDescriptorImageInfo nhImgInfo2 = flatNormalTexture_->descriptorInfo(); VkDescriptorImageInfo nhImgInfo2 = flatNormalTexture_->descriptorInfo();
@ -2320,8 +2396,6 @@ void CharacterRenderer::render(VkCommandBuffer cmd, VkDescriptorSet perFrameSet,
pipelineLayout_, 1, 1, &materialSet, 0, nullptr); pipelineLayout_, 1, 1, &materialSet, 0, nullptr);
vkCmdDrawIndexed(cmd, gpuModel.indexCount, 1, 0, 0, 0); vkCmdDrawIndexed(cmd, gpuModel.indexCount, 1, 0, 0, 0);
transientMaterialUbos_[frameSlot].emplace_back(matUBO, matUBOAlloc);
} }
} }
} }
@ -2513,9 +2587,8 @@ void CharacterRenderer::renderShadow(VkCommandBuffer cmd, const glm::mat4& light
glm::vec3 diff = inst.position - shadowCenter; glm::vec3 diff = inst.position - shadowCenter;
if (glm::dot(diff, diff) > shadowRadiusSq) continue; if (glm::dot(diff, diff) > shadowRadiusSq) continue;
auto modelIt = models.find(inst.modelId); if (!inst.cachedModel) continue;
if (modelIt == models.end()) continue; const M2ModelGPU& gpuModel = *inst.cachedModel;
const M2ModelGPU& gpuModel = modelIt->second;
if (!gpuModel.vertexBuffer) continue; if (!gpuModel.vertexBuffer) continue;
glm::mat4 modelMat = inst.hasOverrideModelMatrix glm::mat4 modelMat = inst.hasOverrideModelMatrix

View file

@ -678,6 +678,7 @@ void M2Renderer::shutdown() {
instances.clear(); instances.clear();
spatialGrid.clear(); spatialGrid.clear();
instanceIndexById.clear(); instanceIndexById.clear();
instanceDedupMap_.clear();
// Delete cached textures // Delete cached textures
textureCache.clear(); textureCache.clear();
@ -1184,6 +1185,10 @@ bool M2Renderer::loadModel(const pipeline::M2Model& model, uint32_t modelId) {
} }
} }
// Batch all GPU uploads (VB, IB, textures) into a single command buffer
// submission with one fence wait, instead of one fence wait per upload.
vkCtx_->beginUploadBatch();
if (hasGeometry) { if (hasGeometry) {
// Create VBO with interleaved vertex data // Create VBO with interleaved vertex data
// Format: position (3), normal (3), texcoord0 (2), texcoord1 (2), boneWeights (4), boneIndices (4 as float) // Format: position (3), normal (3), texcoord0 (2), texcoord1 (2), boneWeights (4), boneIndices (4 as float)
@ -1535,6 +1540,8 @@ bool M2Renderer::loadModel(const pipeline::M2Model& model, uint32_t modelId) {
} }
} }
vkCtx_->endUploadBatch();
// Allocate Vulkan descriptor sets and UBOs for each batch // Allocate Vulkan descriptor sets and UBOs for each batch
for (auto& bgpu : gpuModel.batches) { for (auto& bgpu : gpuModel.batches) {
// Create combined UBO for M2Params (binding 1) + M2Material (binding 2) // Create combined UBO for M2Params (binding 1) + M2Material (binding 2)
@ -1613,17 +1620,16 @@ uint32_t M2Renderer::createInstance(uint32_t modelId, const glm::vec3& position,
} }
const auto& mdlRef = modelIt->second; const auto& mdlRef = modelIt->second;
// Ground clutter is procedurally scattered and high-count; avoid O(N) dedup // Deduplicate: skip if same model already at nearly the same position.
// scans that can hitch when new tiles stream in. // Uses hash map for O(1) lookup instead of O(N) scan.
if (!mdlRef.isGroundDetail) { if (!mdlRef.isGroundDetail) {
// Deduplicate: skip if same model already at nearly the same position DedupKey dk{modelId,
for (const auto& existing : instances) { static_cast<int32_t>(std::round(position.x * 10.0f)),
if (existing.modelId == modelId) { static_cast<int32_t>(std::round(position.y * 10.0f)),
glm::vec3 d = existing.position - position; static_cast<int32_t>(std::round(position.z * 10.0f))};
if (glm::dot(d, d) < 0.01f) { auto dit = instanceDedupMap_.find(dk);
return existing.id; if (dit != instanceDedupMap_.end()) {
} return dit->second;
}
} }
} }
@ -1651,6 +1657,7 @@ uint32_t M2Renderer::createInstance(uint32_t modelId, const glm::vec3& position,
instance.cachedIsInvisibleTrap = mdlRef.isInvisibleTrap; instance.cachedIsInvisibleTrap = mdlRef.isInvisibleTrap;
instance.cachedIsInstancePortal = mdlRef.isInstancePortal; instance.cachedIsInstancePortal = mdlRef.isInstancePortal;
instance.cachedIsValid = mdlRef.isValid(); instance.cachedIsValid = mdlRef.isValid();
instance.cachedModel = &mdlRef;
// Initialize animation: play first sequence (usually Stand/Idle) // Initialize animation: play first sequence (usually Stand/Idle)
const auto& mdl = mdlRef; const auto& mdl = mdlRef;
@ -1662,6 +1669,15 @@ uint32_t M2Renderer::createInstance(uint32_t modelId, const glm::vec3& position,
instance.variationTimer = 3000.0f + static_cast<float>(rand() % 8000); instance.variationTimer = 3000.0f + static_cast<float>(rand() % 8000);
} }
// Register in dedup map before pushing (uses original position, not ground-adjusted)
if (!mdlRef.isGroundDetail) {
DedupKey dk{modelId,
static_cast<int32_t>(std::round(position.x * 10.0f)),
static_cast<int32_t>(std::round(position.y * 10.0f)),
static_cast<int32_t>(std::round(position.z * 10.0f))};
instanceDedupMap_[dk] = instance.id;
}
instances.push_back(instance); instances.push_back(instance);
size_t idx = instances.size() - 1; size_t idx = instances.size() - 1;
// Track special instances for fast-path iteration // Track special instances for fast-path iteration
@ -1700,13 +1716,15 @@ uint32_t M2Renderer::createInstanceWithMatrix(uint32_t modelId, const glm::mat4&
return 0; return 0;
} }
// Deduplicate: skip if same model already at nearly the same position // Deduplicate: O(1) hash lookup
for (const auto& existing : instances) { {
if (existing.modelId == modelId) { DedupKey dk{modelId,
glm::vec3 d = existing.position - position; static_cast<int32_t>(std::round(position.x * 10.0f)),
if (glm::dot(d, d) < 0.01f) { static_cast<int32_t>(std::round(position.y * 10.0f)),
return existing.id; static_cast<int32_t>(std::round(position.z * 10.0f))};
} auto dit = instanceDedupMap_.find(dk);
if (dit != instanceDedupMap_.end()) {
return dit->second;
} }
} }
@ -1731,6 +1749,7 @@ uint32_t M2Renderer::createInstanceWithMatrix(uint32_t modelId, const glm::mat4&
instance.cachedIsGroundDetail = mdl2.isGroundDetail; instance.cachedIsGroundDetail = mdl2.isGroundDetail;
instance.cachedIsInvisibleTrap = mdl2.isInvisibleTrap; instance.cachedIsInvisibleTrap = mdl2.isInvisibleTrap;
instance.cachedIsValid = mdl2.isValid(); instance.cachedIsValid = mdl2.isValid();
instance.cachedModel = &mdl2;
// Initialize animation // Initialize animation
if (mdl2.hasAnimation && !mdl2.disableAnimation && !mdl2.sequences.empty()) { if (mdl2.hasAnimation && !mdl2.disableAnimation && !mdl2.sequences.empty()) {
@ -1743,6 +1762,15 @@ uint32_t M2Renderer::createInstanceWithMatrix(uint32_t modelId, const glm::mat4&
instance.animTime = static_cast<float>(rand()) / RAND_MAX * 10000.0f; instance.animTime = static_cast<float>(rand()) / RAND_MAX * 10000.0f;
} }
// Register in dedup map
{
DedupKey dk{modelId,
static_cast<int32_t>(std::round(position.x * 10.0f)),
static_cast<int32_t>(std::round(position.y * 10.0f)),
static_cast<int32_t>(std::round(position.z * 10.0f))};
instanceDedupMap_[dk] = instance.id;
}
instances.push_back(instance); instances.push_back(instance);
size_t idx = instances.size() - 1; size_t idx = instances.size() - 1;
if (mdl2.isSmoke) { if (mdl2.isSmoke) {
@ -2000,9 +2028,8 @@ void M2Renderer::update(float deltaTime, const glm::vec3& cameraPos, const glm::
instance.animTime += dtMs * (instance.animSpeed - 1.0f); instance.animTime += dtMs * (instance.animSpeed - 1.0f);
// For animation looping/variation, we need the actual model data. // For animation looping/variation, we need the actual model data.
auto it = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (it == models.end()) continue; const M2ModelGPU& model = *instance.cachedModel;
const M2ModelGPU& model = it->second;
// Validate sequence index // Validate sequence index
if (instance.currentSequenceIndex < 0 || if (instance.currentSequenceIndex < 0 ||
@ -2058,6 +2085,14 @@ void M2Renderer::update(float deltaTime, const glm::vec3& cameraPos, const glm::
float paddedRadius = std::max(cullRadius * 1.5f, cullRadius + 3.0f); float paddedRadius = std::max(cullRadius * 1.5f, cullRadius + 3.0f);
if (cullRadius > 0.0f && !updateFrustum.intersectsSphere(instance.position, paddedRadius)) continue; if (cullRadius > 0.0f && !updateFrustum.intersectsSphere(instance.position, paddedRadius)) continue;
// Distance-based frame skipping: update distant bones less frequently
uint32_t boneInterval = 1;
if (distSq > 200.0f * 200.0f) boneInterval = 8;
else if (distSq > 100.0f * 100.0f) boneInterval = 4;
else if (distSq > 50.0f * 50.0f) boneInterval = 2;
instance.frameSkipCounter++;
if ((instance.frameSkipCounter % boneInterval) != 0) continue;
boneWorkIndices_.push_back(idx); boneWorkIndices_.push_back(idx);
} }
@ -2071,9 +2106,8 @@ void M2Renderer::update(float deltaTime, const glm::vec3& cameraPos, const glm::
for (size_t i : boneWorkIndices_) { for (size_t i : boneWorkIndices_) {
if (i >= instances.size()) continue; if (i >= instances.size()) continue;
auto& inst = instances[i]; auto& inst = instances[i];
auto mdlIt = models.find(inst.modelId); if (!inst.cachedModel) continue;
if (mdlIt == models.end()) continue; computeBoneMatrices(*inst.cachedModel, inst);
computeBoneMatrices(mdlIt->second, inst);
} }
} else { } else {
// Parallel — dispatch across worker threads // Parallel — dispatch across worker threads
@ -2086,9 +2120,8 @@ void M2Renderer::update(float deltaTime, const glm::vec3& cameraPos, const glm::
for (size_t i : boneWorkIndices_) { for (size_t i : boneWorkIndices_) {
if (i >= instances.size()) continue; if (i >= instances.size()) continue;
auto& inst = instances[i]; auto& inst = instances[i];
auto mdlIt = models.find(inst.modelId); if (!inst.cachedModel) continue;
if (mdlIt == models.end()) continue; computeBoneMatrices(*inst.cachedModel, inst);
computeBoneMatrices(mdlIt->second, inst);
} }
} else { } else {
const size_t chunkSize = animCount / numThreads; const size_t chunkSize = animCount / numThreads;
@ -2109,9 +2142,8 @@ void M2Renderer::update(float deltaTime, const glm::vec3& cameraPos, const glm::
size_t idx = boneWorkIndices_[j]; size_t idx = boneWorkIndices_[j];
if (idx >= instances.size()) continue; if (idx >= instances.size()) continue;
auto& inst = instances[idx]; auto& inst = instances[idx];
auto mdlIt = models.find(inst.modelId); if (!inst.cachedModel) continue;
if (mdlIt == models.end()) continue; computeBoneMatrices(*inst.cachedModel, inst);
computeBoneMatrices(mdlIt->second, inst);
} }
})); }));
start = end; start = end;
@ -2133,9 +2165,8 @@ void M2Renderer::update(float deltaTime, const glm::vec3& cameraPos, const glm::
glm::vec3 toCam = instance.position - cachedCamPos_; glm::vec3 toCam = instance.position - cachedCamPos_;
float distSq = glm::dot(toCam, toCam); float distSq = glm::dot(toCam, toCam);
if (distSq > cachedMaxRenderDistSq_) continue; if (distSq > cachedMaxRenderDistSq_) continue;
auto mdlIt = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (mdlIt == models.end()) continue; emitParticles(instance, *instance.cachedModel, deltaTime);
emitParticles(instance, mdlIt->second, deltaTime);
updateParticles(instance, deltaTime); updateParticles(instance, deltaTime);
} }
@ -2839,9 +2870,8 @@ void M2Renderer::renderShadow(VkCommandBuffer cmd, const glm::mat4& lightSpaceMa
glm::vec3 diff = instance.position - shadowCenter; glm::vec3 diff = instance.position - shadowCenter;
if (glm::dot(diff, diff) > shadowRadiusSq) continue; if (glm::dot(diff, diff) > shadowRadiusSq) continue;
auto modelIt = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (modelIt == models.end()) continue; const M2ModelGPU& model = *instance.cachedModel;
const M2ModelGPU& model = modelIt->second;
// Filter: only draw foliage models in foliage pass, non-foliage in non-foliage pass // Filter: only draw foliage models in foliage pass, non-foliage in non-foliage pass
if (model.shadowWindFoliage != foliagePass) continue; if (model.shadowWindFoliage != foliagePass) continue;
@ -2947,8 +2977,7 @@ std::vector<glm::vec3> M2Renderer::getWaterVegetationPositions(const glm::vec3&
std::vector<glm::vec3> result; std::vector<glm::vec3> result;
float maxDistSq = maxDist * maxDist; float maxDistSq = maxDist * maxDist;
for (const auto& inst : instances) { for (const auto& inst : instances) {
auto it = models.find(inst.modelId); if (!inst.cachedModel || !inst.cachedModel->isWaterVegetation) continue;
if (it == models.end() || !it->second.isWaterVegetation) continue;
glm::vec3 diff = inst.position - camPos; glm::vec3 diff = inst.position - camPos;
if (glm::dot(diff, diff) <= maxDistSq) { if (glm::dot(diff, diff) <= maxDistSq) {
result.push_back(inst.position); result.push_back(inst.position);
@ -3059,9 +3088,8 @@ void M2Renderer::emitParticles(M2Instance& inst, const M2ModelGPU& gpu, float dt
} }
void M2Renderer::updateParticles(M2Instance& inst, float dt) { void M2Renderer::updateParticles(M2Instance& inst, float dt) {
auto it = models.find(inst.modelId); if (!inst.cachedModel) return;
if (it == models.end()) return; const auto& gpu = *inst.cachedModel;
const auto& gpu = it->second;
for (size_t i = 0; i < inst.particles.size(); ) { for (size_t i = 0; i < inst.particles.size(); ) {
auto& p = inst.particles[i]; auto& p = inst.particles[i];
@ -3136,9 +3164,8 @@ void M2Renderer::renderM2Particles(VkCommandBuffer cmd, VkDescriptorSet perFrame
for (auto& inst : instances) { for (auto& inst : instances) {
if (inst.particles.empty()) continue; if (inst.particles.empty()) continue;
auto it = models.find(inst.modelId); if (!inst.cachedModel) continue;
if (it == models.end()) continue; const auto& gpu = *inst.cachedModel;
const auto& gpu = it->second;
for (const auto& p : inst.particles) { for (const auto& p : inst.particles) {
if (p.emitterIndex < 0 || p.emitterIndex >= static_cast<int>(gpu.particleEmitters.size())) continue; if (p.emitterIndex < 0 || p.emitterIndex >= static_cast<int>(gpu.particleEmitters.size())) continue;
@ -3477,6 +3504,7 @@ void M2Renderer::clear() {
instances.clear(); instances.clear();
spatialGrid.clear(); spatialGrid.clear();
instanceIndexById.clear(); instanceIndexById.clear();
instanceDedupMap_.clear();
smokeParticles.clear(); smokeParticles.clear();
smokeInstanceIndices_.clear(); smokeInstanceIndices_.clear();
portalInstanceIndices_.clear(); portalInstanceIndices_.clear();
@ -3513,6 +3541,7 @@ M2Renderer::GridCell M2Renderer::toCell(const glm::vec3& p) const {
void M2Renderer::rebuildSpatialIndex() { void M2Renderer::rebuildSpatialIndex() {
spatialGrid.clear(); spatialGrid.clear();
instanceIndexById.clear(); instanceIndexById.clear();
instanceDedupMap_.clear();
instanceIndexById.reserve(instances.size()); instanceIndexById.reserve(instances.size());
smokeInstanceIndices_.clear(); smokeInstanceIndices_.clear();
portalInstanceIndices_.clear(); portalInstanceIndices_.clear();
@ -3521,9 +3550,22 @@ void M2Renderer::rebuildSpatialIndex() {
particleInstanceIndices_.clear(); particleInstanceIndices_.clear();
for (size_t i = 0; i < instances.size(); i++) { for (size_t i = 0; i < instances.size(); i++) {
const auto& inst = instances[i]; auto& inst = instances[i];
instanceIndexById[inst.id] = i; instanceIndexById[inst.id] = i;
// Re-cache model pointer (may have changed after model map modifications)
auto mdlIt = models.find(inst.modelId);
inst.cachedModel = (mdlIt != models.end()) ? &mdlIt->second : nullptr;
// Rebuild dedup map (skip ground detail)
if (!inst.cachedIsGroundDetail) {
DedupKey dk{inst.modelId,
static_cast<int32_t>(std::round(inst.position.x * 10.0f)),
static_cast<int32_t>(std::round(inst.position.y * 10.0f)),
static_cast<int32_t>(std::round(inst.position.z * 10.0f))};
instanceDedupMap_[dk] = inst.id;
}
if (inst.cachedIsSmoke) { if (inst.cachedIsSmoke) {
smokeInstanceIndices_.push_back(i); smokeInstanceIndices_.push_back(i);
} }
@ -3647,8 +3689,18 @@ VkTexture* M2Renderer::loadTexture(const std::string& path, uint32_t texFlags) {
containsToken(key, "campfire") || containsToken(key, "campfire") ||
containsToken(key, "bonfire"); containsToken(key, "bonfire");
// Load BLP texture // Check pre-decoded BLP cache first (populated by background worker threads)
pipeline::BLPImage blp = assetManager->loadTexture(key); pipeline::BLPImage blp;
if (predecodedBLPCache_) {
auto pit = predecodedBLPCache_->find(key);
if (pit != predecodedBLPCache_->end()) {
blp = std::move(pit->second);
predecodedBLPCache_->erase(pit);
}
}
if (!blp.isValid()) {
blp = assetManager->loadTexture(key);
}
if (!blp.isValid()) { if (!blp.isValid()) {
// Return white fallback but don't cache the failure — MPQ reads can // Return white fallback but don't cache the failure — MPQ reads can
// fail transiently during streaming; allow retry on next model load. // fail transiently during streaming; allow retry on next model load.
@ -3714,9 +3766,8 @@ VkTexture* M2Renderer::loadTexture(const std::string& path, uint32_t texFlags) {
uint32_t M2Renderer::getTotalTriangleCount() const { uint32_t M2Renderer::getTotalTriangleCount() const {
uint32_t total = 0; uint32_t total = 0;
for (const auto& instance : instances) { for (const auto& instance : instances) {
auto it = models.find(instance.modelId); if (instance.cachedModel) {
if (it != models.end()) { total += instance.cachedModel->indexCount / 3;
total += it->second.indexCount / 3;
} }
} }
return total; return total;
@ -3738,11 +3789,10 @@ std::optional<float> M2Renderer::getFloorHeight(float glX, float glY, float glZ,
continue; continue;
} }
auto it = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (it == models.end()) continue;
if (instance.scale <= 0.001f) continue; if (instance.scale <= 0.001f) continue;
const M2ModelGPU& model = it->second; const M2ModelGPU& model = *instance.cachedModel;
if (model.collisionNoBlock || model.isInvisibleTrap || model.isSpellEffect) continue; if (model.collisionNoBlock || model.isInvisibleTrap || model.isSpellEffect) continue;
if (instance.skipCollision) continue; if (instance.skipCollision) continue;
@ -3894,10 +3944,9 @@ bool M2Renderer::checkCollision(const glm::vec3& from, const glm::vec3& to,
if (from.z > instance.worldBoundsMax.z + 2.5f && adjustedPos.z > instance.worldBoundsMax.z + 2.5f) continue; if (from.z > instance.worldBoundsMax.z + 2.5f && adjustedPos.z > instance.worldBoundsMax.z + 2.5f) continue;
if (from.z + 2.5f < instance.worldBoundsMin.z && adjustedPos.z + 2.5f < instance.worldBoundsMin.z) continue; if (from.z + 2.5f < instance.worldBoundsMin.z && adjustedPos.z + 2.5f < instance.worldBoundsMin.z) continue;
auto it = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (it == models.end()) continue;
const M2ModelGPU& model = it->second; const M2ModelGPU& model = *instance.cachedModel;
if (model.collisionNoBlock || model.isInvisibleTrap || model.isSpellEffect) continue; if (model.collisionNoBlock || model.isInvisibleTrap || model.isSpellEffect) continue;
if (instance.skipCollision) continue; if (instance.skipCollision) continue;
if (instance.scale <= 0.001f) continue; if (instance.scale <= 0.001f) continue;
@ -4135,10 +4184,9 @@ float M2Renderer::raycastBoundingBoxes(const glm::vec3& origin, const glm::vec3&
continue; continue;
} }
auto it = models.find(instance.modelId); if (!instance.cachedModel) continue;
if (it == models.end()) continue;
const M2ModelGPU& model = it->second; const M2ModelGPU& model = *instance.cachedModel;
if (model.collisionNoBlock || model.isInvisibleTrap || model.isSpellEffect) continue; if (model.collisionNoBlock || model.isInvisibleTrap || model.isSpellEffect) continue;
glm::vec3 localMin, localMax; glm::vec3 localMin, localMax;
getTightCollisionBounds(model, localMin, localMax); getTightCollisionBounds(model, localMin, localMax);

View file

@ -2434,6 +2434,9 @@ void Renderer::update(float deltaTime) {
cameraController->update(deltaTime); cameraController->update(deltaTime);
auto cameraEnd = std::chrono::steady_clock::now(); auto cameraEnd = std::chrono::steady_clock::now();
lastCameraUpdateMs = std::chrono::duration<double, std::milli>(cameraEnd - cameraStart).count(); lastCameraUpdateMs = std::chrono::duration<double, std::milli>(cameraEnd - cameraStart).count();
if (lastCameraUpdateMs > 3.0) {
LOG_WARNING("SLOW cameraController->update: ", lastCameraUpdateMs, "ms");
}
// Update 3D audio listener position/orientation to match camera // Update 3D audio listener position/orientation to match camera
if (camera) { if (camera) {
@ -2527,7 +2530,13 @@ void Renderer::update(float deltaTime) {
// Update terrain streaming // Update terrain streaming
if (terrainManager && camera) { if (terrainManager && camera) {
auto terrStart = std::chrono::steady_clock::now();
terrainManager->update(*camera, deltaTime); terrainManager->update(*camera, deltaTime);
float terrMs = std::chrono::duration<float, std::milli>(
std::chrono::steady_clock::now() - terrStart).count();
if (terrMs > 5.0f) {
LOG_WARNING("SLOW terrainManager->update: ", terrMs, "ms");
}
} }
// Update sky system (skybox time, star twinkle, clouds, celestial moon phases) // Update sky system (skybox time, star twinkle, clouds, celestial moon phases)
@ -2579,7 +2588,14 @@ void Renderer::update(float deltaTime) {
// Update character animations // Update character animations
if (characterRenderer && camera) { if (characterRenderer && camera) {
auto charAnimStart = std::chrono::steady_clock::now();
characterRenderer->update(deltaTime, camera->getPosition()); characterRenderer->update(deltaTime, camera->getPosition());
float charAnimMs = std::chrono::duration<float, std::milli>(
std::chrono::steady_clock::now() - charAnimStart).count();
if (charAnimMs > 5.0f) {
LOG_WARNING("SLOW characterRenderer->update: ", charAnimMs, "ms (",
characterRenderer->getInstanceCount(), " instances)");
}
} }
// Update AudioEngine (cleanup finished sounds, etc.) // Update AudioEngine (cleanup finished sounds, etc.)
@ -2766,8 +2782,15 @@ void Renderer::update(float deltaTime) {
// Update M2 doodad animations (pass camera for frustum-culling bone computation) // Update M2 doodad animations (pass camera for frustum-culling bone computation)
if (m2Renderer && camera) { if (m2Renderer && camera) {
auto m2Start = std::chrono::steady_clock::now();
m2Renderer->update(deltaTime, camera->getPosition(), m2Renderer->update(deltaTime, camera->getPosition(),
camera->getProjectionMatrix() * camera->getViewMatrix()); camera->getProjectionMatrix() * camera->getViewMatrix());
float m2Ms = std::chrono::duration<float, std::milli>(
std::chrono::steady_clock::now() - m2Start).count();
if (m2Ms > 3.0f) {
LOG_WARNING("SLOW m2Renderer->update: ", m2Ms, "ms (",
m2Renderer->getInstanceCount(), " instances)");
}
} }
// Helper: play zone music, dispatching local files (file: prefix) vs MPQ paths // Helper: play zone music, dispatching local files (file: prefix) vs MPQ paths

View file

@ -1,5 +1,6 @@
#include "rendering/terrain_manager.hpp" #include "rendering/terrain_manager.hpp"
#include "rendering/terrain_renderer.hpp" #include "rendering/terrain_renderer.hpp"
#include "rendering/vk_context.hpp"
#include "rendering/water_renderer.hpp" #include "rendering/water_renderer.hpp"
#include "rendering/m2_renderer.hpp" #include "rendering/m2_renderer.hpp"
#include "rendering/wmo_renderer.hpp" #include "rendering/wmo_renderer.hpp"
@ -53,12 +54,12 @@ int computeTerrainWorkerCount() {
unsigned hc = std::thread::hardware_concurrency(); unsigned hc = std::thread::hardware_concurrency();
if (hc > 0) { if (hc > 0) {
// Terrain streaming should leave CPU room for render/update threads. // Use most cores for loading — leave 1-2 for render/update threads.
const unsigned availableCores = (hc > 1u) ? (hc - 1u) : 1u; const unsigned reserved = (hc >= 8u) ? 2u : 1u;
const unsigned targetWorkers = std::max(2u, availableCores / 2u); const unsigned targetWorkers = std::max(4u, hc - reserved);
return static_cast<int>(targetWorkers); return static_cast<int>(targetWorkers);
} }
return 2; // Fallback return 4; // Fallback
} }
bool decodeLayerAlpha(const pipeline::MapChunk& chunk, size_t layerIdx, std::vector<uint8_t>& outAlpha) { bool decodeLayerAlpha(const pipeline::MapChunk& chunk, size_t layerIdx, std::vector<uint8_t>& outAlpha) {
@ -230,9 +231,14 @@ bool TerrainManager::loadTile(int x, int y) {
return false; return false;
} }
VkContext* vkCtx = terrainRenderer ? terrainRenderer->getVkContext() : nullptr;
if (vkCtx) vkCtx->beginUploadBatch();
FinalizingTile ft; FinalizingTile ft;
ft.pending = std::move(pending); ft.pending = std::move(pending);
while (!advanceFinalization(ft)) {} while (!advanceFinalization(ft)) {}
if (vkCtx) vkCtx->endUploadBatchSync(); // Sync — caller expects tile ready
return true; return true;
} }
@ -372,6 +378,15 @@ std::shared_ptr<PendingTile> TerrainManager::prepareTile(int x, int y) {
int& skippedSkinNotFound) -> bool { int& skippedSkinNotFound) -> bool {
if (preparedModelIds.find(modelId) != preparedModelIds.end()) return true; if (preparedModelIds.find(modelId) != preparedModelIds.end()) return true;
// Skip file I/O + parsing for models already uploaded to GPU from previous tiles
{
std::lock_guard<std::mutex> lock(uploadedM2IdsMutex_);
if (uploadedM2Ids_.count(modelId)) {
preparedModelIds.insert(modelId);
return true;
}
}
std::vector<uint8_t> m2Data = assetManager->readFile(m2Path); std::vector<uint8_t> m2Data = assetManager->readFile(m2Path);
if (m2Data.empty()) { if (m2Data.empty()) {
skippedFileNotFound++; skippedFileNotFound++;
@ -397,6 +412,20 @@ std::shared_ptr<PendingTile> TerrainManager::prepareTile(int x, int y) {
return false; return false;
} }
// Pre-decode M2 model textures on background thread
for (const auto& tex : m2Model.textures) {
if (tex.filename.empty()) continue;
std::string texKey = tex.filename;
std::replace(texKey.begin(), texKey.end(), '/', '\\');
std::transform(texKey.begin(), texKey.end(), texKey.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
if (pending->preloadedM2Textures.find(texKey) != pending->preloadedM2Textures.end()) continue;
auto blp = assetManager->loadTexture(texKey);
if (blp.isValid()) {
pending->preloadedM2Textures[texKey] = std::move(blp);
}
}
PendingTile::M2Ready ready; PendingTile::M2Ready ready;
ready.modelId = modelId; ready.modelId = modelId;
ready.model = std::move(m2Model); ready.model = std::move(m2Model);
@ -551,10 +580,20 @@ std::shared_ptr<PendingTile> TerrainManager::prepareTile(int x, int y) {
} }
uint32_t doodadModelId = static_cast<uint32_t>(std::hash<std::string>{}(m2Path)); uint32_t doodadModelId = static_cast<uint32_t>(std::hash<std::string>{}(m2Path));
// Skip file I/O if model already uploaded from a previous tile
bool modelAlreadyUploaded = false;
{
std::lock_guard<std::mutex> lock(uploadedM2IdsMutex_);
modelAlreadyUploaded = uploadedM2Ids_.count(doodadModelId) > 0;
}
pipeline::M2Model m2Model;
if (!modelAlreadyUploaded) {
std::vector<uint8_t> m2Data = assetManager->readFile(m2Path); std::vector<uint8_t> m2Data = assetManager->readFile(m2Path);
if (m2Data.empty()) continue; if (m2Data.empty()) continue;
pipeline::M2Model m2Model = pipeline::M2Loader::load(m2Data); m2Model = pipeline::M2Loader::load(m2Data);
if (m2Model.name.empty()) { if (m2Model.name.empty()) {
m2Model.name = m2Path; m2Model.name = m2Path;
} }
@ -565,6 +604,21 @@ std::shared_ptr<PendingTile> TerrainManager::prepareTile(int x, int y) {
} }
if (!m2Model.isValid()) continue; if (!m2Model.isValid()) continue;
// Pre-decode doodad M2 textures on background thread
for (const auto& tex : m2Model.textures) {
if (tex.filename.empty()) continue;
std::string texKey = tex.filename;
std::replace(texKey.begin(), texKey.end(), '/', '\\');
std::transform(texKey.begin(), texKey.end(), texKey.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
if (pending->preloadedM2Textures.find(texKey) != pending->preloadedM2Textures.end()) continue;
auto blp = assetManager->loadTexture(texKey);
if (blp.isValid()) {
pending->preloadedM2Textures[texKey] = std::move(blp);
}
}
}
// Build doodad's local transform (WoW coordinates) // Build doodad's local transform (WoW coordinates)
// WMO doodads use quaternion rotation // WMO doodads use quaternion rotation
glm::quat fixedRotation(doodad.rotation.w, doodad.rotation.x, doodad.rotation.y, doodad.rotation.z); glm::quat fixedRotation(doodad.rotation.w, doodad.rotation.x, doodad.rotation.y, doodad.rotation.z);
@ -633,6 +687,32 @@ std::shared_ptr<PendingTile> TerrainManager::prepareTile(int x, int y) {
} }
} }
// Pre-decode WMO textures on background thread
for (const auto& texPath : wmoModel.textures) {
if (texPath.empty()) continue;
std::string texKey = texPath;
// Truncate at NUL (WMO paths can have stray bytes)
size_t nul = texKey.find('\0');
if (nul != std::string::npos) texKey.resize(nul);
std::replace(texKey.begin(), texKey.end(), '/', '\\');
std::transform(texKey.begin(), texKey.end(), texKey.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
if (texKey.empty()) continue;
if (pending->preloadedWMOTextures.find(texKey) != pending->preloadedWMOTextures.end()) continue;
// Try .blp variant
std::string blpKey = texKey;
if (blpKey.size() >= 4) {
std::string ext = blpKey.substr(blpKey.size() - 4);
if (ext == ".tga" || ext == ".dds") {
blpKey = blpKey.substr(0, blpKey.size() - 4) + ".blp";
}
}
auto blp = assetManager->loadTexture(blpKey);
if (blp.isValid()) {
pending->preloadedWMOTextures[blpKey] = std::move(blp);
}
}
PendingTile::WMOReady ready; PendingTile::WMOReady ready;
// Cache WMO model uploads by path; placement dedup uses uniqueId separately. // Cache WMO model uploads by path; placement dedup uses uniqueId separately.
ready.modelId = static_cast<uint32_t>(std::hash<std::string>{}(wmoPath)); ready.modelId = static_cast<uint32_t>(std::hash<std::string>{}(wmoPath));
@ -695,15 +775,20 @@ bool TerrainManager::advanceFinalization(FinalizingTile& ft) {
return true; return true;
} }
// Upload pre-loaded textures (once)
if (!ft.terrainPreloaded) {
LOG_DEBUG("Finalizing tile [", x, ",", y, "] (incremental)"); LOG_DEBUG("Finalizing tile [", x, ",", y, "] (incremental)");
// Upload pre-loaded textures
if (!pending->preloadedTextures.empty()) { if (!pending->preloadedTextures.empty()) {
terrainRenderer->uploadPreloadedTextures(pending->preloadedTextures); terrainRenderer->uploadPreloadedTextures(pending->preloadedTextures);
} }
ft.terrainPreloaded = true;
// Yield after preload to give time budget a chance to interrupt
return false;
}
// Upload terrain mesh to GPU // Upload terrain chunks incrementally (16 per call to spread across frames)
if (!terrainRenderer->loadTerrain(pending->mesh, pending->terrain.textures, x, y)) { if (!ft.terrainMeshDone) {
if (pending->mesh.validChunkCount == 0) {
LOG_ERROR("Failed to upload terrain to GPU for tile [", x, ",", y, "]"); LOG_ERROR("Failed to upload terrain to GPU for tile [", x, ",", y, "]");
failedTiles[coord] = true; failedTiles[coord] = true;
{ {
@ -713,9 +798,16 @@ bool TerrainManager::advanceFinalization(FinalizingTile& ft) {
ft.phase = FinalizationPhase::DONE; ft.phase = FinalizationPhase::DONE;
return true; return true;
} }
bool allDone = terrainRenderer->loadTerrainIncremental(
pending->mesh, pending->terrain.textures, x, y,
ft.terrainChunkNext, 32);
if (!allDone) {
return false; // More chunks remain — yield to time budget
}
ft.terrainMeshDone = true;
}
// Load water immediately after terrain (same frame) — water is now // Load water after all terrain chunks are uploaded
// deduplicated to ~1-2 merged surfaces per tile, so this is fast.
if (waterRenderer) { if (waterRenderer) {
size_t beforeSurfaces = waterRenderer->getSurfaceCount(); size_t beforeSurfaces = waterRenderer->getSurfaceCount();
waterRenderer->loadFromTerrain(pending->terrain, true, x, y); waterRenderer->loadFromTerrain(pending->terrain, true, x, y);
@ -738,13 +830,24 @@ bool TerrainManager::advanceFinalization(FinalizingTile& ft) {
} }
case FinalizationPhase::M2_MODELS: { case FinalizationPhase::M2_MODELS: {
// Upload ONE M2 model per call // Upload multiple M2 models per call (batched GPU uploads)
if (m2Renderer && ft.m2ModelIndex < pending->m2Models.size()) { if (m2Renderer && ft.m2ModelIndex < pending->m2Models.size()) {
// Set pre-decoded BLP cache so loadTexture() skips main-thread BLP decode
m2Renderer->setPredecodedBLPCache(&pending->preloadedM2Textures);
constexpr size_t kModelsPerStep = 4;
size_t uploaded = 0;
while (ft.m2ModelIndex < pending->m2Models.size() && uploaded < kModelsPerStep) {
auto& m2Ready = pending->m2Models[ft.m2ModelIndex]; auto& m2Ready = pending->m2Models[ft.m2ModelIndex];
if (m2Renderer->loadModel(m2Ready.model, m2Ready.modelId)) { if (m2Renderer->loadModel(m2Ready.model, m2Ready.modelId)) {
ft.uploadedM2ModelIds.insert(m2Ready.modelId); ft.uploadedM2ModelIds.insert(m2Ready.modelId);
// Track uploaded model IDs so background threads can skip re-reading
std::lock_guard<std::mutex> lock(uploadedM2IdsMutex_);
uploadedM2Ids_.insert(m2Ready.modelId);
} }
ft.m2ModelIndex++; ft.m2ModelIndex++;
uploaded++;
}
m2Renderer->setPredecodedBLPCache(nullptr);
// Stay in this phase until all models uploaded // Stay in this phase until all models uploaded
if (ft.m2ModelIndex < pending->m2Models.size()) { if (ft.m2ModelIndex < pending->m2Models.size()) {
return false; return false;
@ -786,23 +889,29 @@ bool TerrainManager::advanceFinalization(FinalizingTile& ft) {
} }
case FinalizationPhase::WMO_MODELS: { case FinalizationPhase::WMO_MODELS: {
// Upload ONE WMO model per call // Upload multiple WMO models per call (batched GPU uploads)
if (wmoRenderer && assetManager) { if (wmoRenderer && assetManager) {
wmoRenderer->initialize(nullptr, VK_NULL_HANDLE, assetManager); wmoRenderer->initialize(nullptr, VK_NULL_HANDLE, assetManager);
// Set pre-decoded BLP cache and defer normal maps during streaming
wmoRenderer->setPredecodedBLPCache(&pending->preloadedWMOTextures);
wmoRenderer->setDeferNormalMaps(true);
if (ft.wmoModelIndex < pending->wmoModels.size()) { constexpr size_t kWmosPerStep = 1;
size_t uploaded = 0;
while (ft.wmoModelIndex < pending->wmoModels.size() && uploaded < kWmosPerStep) {
auto& wmoReady = pending->wmoModels[ft.wmoModelIndex]; auto& wmoReady = pending->wmoModels[ft.wmoModelIndex];
// Deduplicate
if (wmoReady.uniqueId != 0 && placedWmoIds.count(wmoReady.uniqueId)) { if (wmoReady.uniqueId != 0 && placedWmoIds.count(wmoReady.uniqueId)) {
ft.wmoModelIndex++; ft.wmoModelIndex++;
if (ft.wmoModelIndex < pending->wmoModels.size()) return false;
} else { } else {
wmoRenderer->loadModel(wmoReady.model, wmoReady.modelId); wmoRenderer->loadModel(wmoReady.model, wmoReady.modelId);
ft.wmoModelIndex++; ft.wmoModelIndex++;
uploaded++;
}
}
wmoRenderer->setDeferNormalMaps(false);
wmoRenderer->setPredecodedBLPCache(nullptr);
if (ft.wmoModelIndex < pending->wmoModels.size()) return false; if (ft.wmoModelIndex < pending->wmoModels.size()) return false;
} }
}
}
ft.phase = FinalizationPhase::WMO_INSTANCES; ft.phase = FinalizationPhase::WMO_INSTANCES;
return false; return false;
} }
@ -862,10 +971,18 @@ bool TerrainManager::advanceFinalization(FinalizingTile& ft) {
} }
case FinalizationPhase::WMO_DOODADS: { case FinalizationPhase::WMO_DOODADS: {
// Upload ONE WMO doodad M2 per call // Upload multiple WMO doodad M2s per call (batched GPU uploads)
if (m2Renderer && ft.wmoDoodadIndex < pending->wmoDoodads.size()) { if (m2Renderer && ft.wmoDoodadIndex < pending->wmoDoodads.size()) {
// Set pre-decoded BLP cache for doodad M2 textures
m2Renderer->setPredecodedBLPCache(&pending->preloadedM2Textures);
constexpr size_t kDoodadsPerStep = 4;
size_t uploaded = 0;
while (ft.wmoDoodadIndex < pending->wmoDoodads.size() && uploaded < kDoodadsPerStep) {
auto& doodad = pending->wmoDoodads[ft.wmoDoodadIndex]; auto& doodad = pending->wmoDoodads[ft.wmoDoodadIndex];
m2Renderer->loadModel(doodad.model, doodad.modelId); if (m2Renderer->loadModel(doodad.model, doodad.modelId)) {
std::lock_guard<std::mutex> lock(uploadedM2IdsMutex_);
uploadedM2Ids_.insert(doodad.modelId);
}
uint32_t wmoDoodadInstId = m2Renderer->createInstanceWithMatrix( uint32_t wmoDoodadInstId = m2Renderer->createInstanceWithMatrix(
doodad.modelId, doodad.modelMatrix, doodad.worldPosition); doodad.modelId, doodad.modelMatrix, doodad.worldPosition);
if (wmoDoodadInstId) { if (wmoDoodadInstId) {
@ -873,6 +990,9 @@ bool TerrainManager::advanceFinalization(FinalizingTile& ft) {
ft.m2InstanceIds.push_back(wmoDoodadInstId); ft.m2InstanceIds.push_back(wmoDoodadInstId);
} }
ft.wmoDoodadIndex++; ft.wmoDoodadIndex++;
uploaded++;
}
m2Renderer->setPredecodedBLPCache(nullptr);
if (ft.wmoDoodadIndex < pending->wmoDoodads.size()) return false; if (ft.wmoDoodadIndex < pending->wmoDoodads.size()) return false;
} }
ft.phase = FinalizationPhase::WATER; ft.phase = FinalizationPhase::WATER;
@ -1030,11 +1150,6 @@ void TerrainManager::workerLoop() {
} }
void TerrainManager::processReadyTiles() { void TerrainManager::processReadyTiles() {
// Process tiles with time budget to avoid frame spikes
// Taxi mode gets a slightly larger budget to avoid visible late-pop terrain/models.
const float timeBudgetMs = taxiStreamingMode_ ? 8.0f : 5.0f;
auto startTime = std::chrono::high_resolution_clock::now();
// Move newly ready tiles into the finalizing deque. // Move newly ready tiles into the finalizing deque.
// Keep them in pendingTiles so streamTiles() won't re-enqueue them. // Keep them in pendingTiles so streamTiles() won't re-enqueue them.
{ {
@ -1050,21 +1165,32 @@ void TerrainManager::processReadyTiles() {
} }
} }
// Drive incremental finalization within time budget VkContext* vkCtx = terrainRenderer ? terrainRenderer->getVkContext() : nullptr;
while (!finalizingTiles_.empty()) {
// Reclaim completed async uploads from previous frames (non-blocking)
if (vkCtx) vkCtx->pollUploadBatches();
// Nothing to finalize — done.
if (finalizingTiles_.empty()) return;
// Async upload batch: record GPU copies into a command buffer, submit with
// a fence, but DON'T wait. The fence is polled on subsequent frames.
// This eliminates the main-thread stall from vkWaitForFences entirely.
const int maxSteps = taxiStreamingMode_ ? 8 : 2;
int steps = 0;
if (vkCtx) vkCtx->beginUploadBatch();
while (!finalizingTiles_.empty() && steps < maxSteps) {
auto& ft = finalizingTiles_.front(); auto& ft = finalizingTiles_.front();
bool done = advanceFinalization(ft); bool done = advanceFinalization(ft);
if (done) { if (done) {
finalizingTiles_.pop_front(); finalizingTiles_.pop_front();
} }
steps++;
}
auto now = std::chrono::high_resolution_clock::now(); if (vkCtx) vkCtx->endUploadBatch(); // Async — submits but doesn't wait
float elapsedMs = std::chrono::duration<float, std::milli>(now - startTime).count();
if (elapsedMs >= timeBudgetMs) {
break;
}
}
} }
void TerrainManager::processAllReadyTiles() { void TerrainManager::processAllReadyTiles() {
@ -1082,12 +1208,19 @@ void TerrainManager::processAllReadyTiles() {
} }
} }
} }
// Batch all GPU uploads across all tiles into a single submission
VkContext* vkCtx = terrainRenderer ? terrainRenderer->getVkContext() : nullptr;
if (vkCtx) vkCtx->beginUploadBatch();
// Finalize all tiles completely (no time budget — used for loading screens) // Finalize all tiles completely (no time budget — used for loading screens)
while (!finalizingTiles_.empty()) { while (!finalizingTiles_.empty()) {
auto& ft = finalizingTiles_.front(); auto& ft = finalizingTiles_.front();
while (!advanceFinalization(ft)) {} while (!advanceFinalization(ft)) {}
finalizingTiles_.pop_front(); finalizingTiles_.pop_front();
} }
if (vkCtx) vkCtx->endUploadBatchSync(); // Sync — load screen needs data ready
} }
void TerrainManager::processOneReadyTile() { void TerrainManager::processOneReadyTile() {
@ -1106,9 +1239,14 @@ void TerrainManager::processOneReadyTile() {
} }
// Finalize ONE tile completely, then return so caller can update the screen // Finalize ONE tile completely, then return so caller can update the screen
if (!finalizingTiles_.empty()) { if (!finalizingTiles_.empty()) {
VkContext* vkCtx = terrainRenderer ? terrainRenderer->getVkContext() : nullptr;
if (vkCtx) vkCtx->beginUploadBatch();
auto& ft = finalizingTiles_.front(); auto& ft = finalizingTiles_.front();
while (!advanceFinalization(ft)) {} while (!advanceFinalization(ft)) {}
finalizingTiles_.pop_front(); finalizingTiles_.pop_front();
if (vkCtx) vkCtx->endUploadBatchSync(); // Sync — load screen needs data ready
} }
} }
@ -1328,6 +1466,10 @@ void TerrainManager::unloadAll() {
finalizingTiles_.clear(); finalizingTiles_.clear();
placedDoodadIds.clear(); placedDoodadIds.clear();
placedWmoIds.clear(); placedWmoIds.clear();
{
std::lock_guard<std::mutex> lock(uploadedM2IdsMutex_);
uploadedM2Ids_.clear();
}
LOG_INFO("Unloading all terrain tiles"); LOG_INFO("Unloading all terrain tiles");
loadedTiles.clear(); loadedTiles.clear();
@ -1376,6 +1518,10 @@ void TerrainManager::softReset() {
finalizingTiles_.clear(); finalizingTiles_.clear();
placedDoodadIds.clear(); placedDoodadIds.clear();
placedWmoIds.clear(); placedWmoIds.clear();
{
std::lock_guard<std::mutex> lock(uploadedM2IdsMutex_);
uploadedM2Ids_.clear();
}
// Clear tile cache — keys are (x,y) without map name, so stale entries from // Clear tile cache — keys are (x,y) without map name, so stale entries from
// a different map with overlapping coordinates would produce wrong geometry. // a different map with overlapping coordinates would produce wrong geometry.

View file

@ -326,6 +326,8 @@ bool TerrainRenderer::loadTerrain(const pipeline::TerrainMesh& mesh,
} }
LOG_DEBUG("Loading terrain mesh: ", mesh.validChunkCount, " chunks"); LOG_DEBUG("Loading terrain mesh: ", mesh.validChunkCount, " chunks");
vkCtx->beginUploadBatch();
for (int y = 0; y < 16; y++) { for (int y = 0; y < 16; y++) {
for (int x = 0; x < 16; x++) { for (int x = 0; x < 16; x++) {
const auto& chunk = mesh.getChunk(x, y); const auto& chunk = mesh.getChunk(x, y);
@ -405,10 +407,102 @@ bool TerrainRenderer::loadTerrain(const pipeline::TerrainMesh& mesh,
} }
} }
vkCtx->endUploadBatch();
LOG_DEBUG("Loaded ", chunks.size(), " terrain chunks to GPU"); LOG_DEBUG("Loaded ", chunks.size(), " terrain chunks to GPU");
return !chunks.empty(); return !chunks.empty();
} }
bool TerrainRenderer::loadTerrainIncremental(const pipeline::TerrainMesh& mesh,
const std::vector<std::string>& texturePaths,
int tileX, int tileY,
int& chunkIndex, int maxChunksPerCall) {
// Batch all GPU uploads (VBs, IBs, textures) into a single command buffer
// submission with one fence wait, instead of one per buffer/texture.
vkCtx->beginUploadBatch();
int uploaded = 0;
while (chunkIndex < 256 && uploaded < maxChunksPerCall) {
int cy = chunkIndex / 16;
int cx = chunkIndex % 16;
chunkIndex++;
const auto& chunk = mesh.getChunk(cx, cy);
if (!chunk.isValid()) continue;
TerrainChunkGPU gpuChunk = uploadChunk(chunk);
if (!gpuChunk.isValid()) continue;
calculateBoundingSphere(gpuChunk, chunk);
if (!chunk.layers.empty()) {
uint32_t baseTexId = chunk.layers[0].textureId;
if (baseTexId < texturePaths.size()) {
gpuChunk.baseTexture = loadTexture(texturePaths[baseTexId]);
} else {
gpuChunk.baseTexture = whiteTexture.get();
}
for (size_t i = 1; i < chunk.layers.size() && i < 4; i++) {
const auto& layer = chunk.layers[i];
int li = static_cast<int>(i) - 1;
VkTexture* layerTex = whiteTexture.get();
if (layer.textureId < texturePaths.size()) {
layerTex = loadTexture(texturePaths[layer.textureId]);
}
gpuChunk.layerTextures[li] = layerTex;
VkTexture* alphaTex = opaqueAlphaTexture.get();
if (!layer.alphaData.empty()) {
alphaTex = createAlphaTexture(layer.alphaData);
}
gpuChunk.alphaTextures[li] = alphaTex;
gpuChunk.layerCount = static_cast<int>(i);
}
} else {
gpuChunk.baseTexture = whiteTexture.get();
}
gpuChunk.tileX = tileX;
gpuChunk.tileY = tileY;
TerrainParamsUBO params{};
params.layerCount = gpuChunk.layerCount;
params.hasLayer1 = gpuChunk.layerCount >= 1 ? 1 : 0;
params.hasLayer2 = gpuChunk.layerCount >= 2 ? 1 : 0;
params.hasLayer3 = gpuChunk.layerCount >= 3 ? 1 : 0;
VkBufferCreateInfo bufCI{};
bufCI.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
bufCI.size = sizeof(TerrainParamsUBO);
bufCI.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT;
VmaAllocationCreateInfo allocCI{};
allocCI.usage = VMA_MEMORY_USAGE_CPU_TO_GPU;
allocCI.flags = VMA_ALLOCATION_CREATE_MAPPED_BIT;
VmaAllocationInfo mapInfo{};
vmaCreateBuffer(vkCtx->getAllocator(), &bufCI, &allocCI,
&gpuChunk.paramsUBO, &gpuChunk.paramsAlloc, &mapInfo);
if (mapInfo.pMappedData) {
std::memcpy(mapInfo.pMappedData, &params, sizeof(params));
}
gpuChunk.materialSet = allocateMaterialSet();
if (gpuChunk.materialSet) {
writeMaterialDescriptors(gpuChunk.materialSet, gpuChunk);
}
chunks.push_back(std::move(gpuChunk));
uploaded++;
}
vkCtx->endUploadBatch();
return chunkIndex >= 256;
}
TerrainChunkGPU TerrainRenderer::uploadChunk(const pipeline::ChunkMesh& chunk) { TerrainChunkGPU TerrainRenderer::uploadChunk(const pipeline::ChunkMesh& chunk) {
TerrainChunkGPU gpuChunk; TerrainChunkGPU gpuChunk;
@ -496,6 +590,9 @@ void TerrainRenderer::uploadPreloadedTextures(
[](unsigned char c) { return static_cast<char>(std::tolower(c)); }); [](unsigned char c) { return static_cast<char>(std::tolower(c)); });
return key; return key;
}; };
// Batch all texture uploads into a single command buffer submission
vkCtx->beginUploadBatch();
for (const auto& [path, blp] : textures) { for (const auto& [path, blp] : textures) {
std::string key = normalizeKey(path); std::string key = normalizeKey(path);
if (textureCache.find(key) != textureCache.end()) continue; if (textureCache.find(key) != textureCache.end()) continue;
@ -515,6 +612,8 @@ void TerrainRenderer::uploadPreloadedTextures(
textureCacheBytes_ += e.approxBytes; textureCacheBytes_ += e.approxBytes;
textureCache[key] = std::move(e); textureCache[key] = std::move(e);
} }
vkCtx->endUploadBatch();
} }
VkTexture* TerrainRenderer::createAlphaTexture(const std::vector<uint8_t>& alphaData) { VkTexture* TerrainRenderer::createAlphaTexture(const std::vector<uint8_t>& alphaData) {

View file

@ -67,6 +67,14 @@ void VkContext::shutdown() {
frame = {}; frame = {};
} }
// Clean up any in-flight async upload batches (device already idle)
for (auto& batch : inFlightBatches_) {
// Staging buffers: skip destroy — allocator is about to be torn down
vkDestroyFence(device, batch.fence, nullptr);
// Command buffer freed when pool is destroyed below
}
inFlightBatches_.clear();
if (immFence) { vkDestroyFence(device, immFence, nullptr); immFence = VK_NULL_HANDLE; } if (immFence) { vkDestroyFence(device, immFence, nullptr); immFence = VK_NULL_HANDLE; }
if (immCommandPool) { vkDestroyCommandPool(device, immCommandPool, nullptr); immCommandPool = VK_NULL_HANDLE; } if (immCommandPool) { vkDestroyCommandPool(device, immCommandPool, nullptr); immCommandPool = VK_NULL_HANDLE; }
@ -1423,10 +1431,121 @@ void VkContext::endSingleTimeCommands(VkCommandBuffer cmd) {
} }
void VkContext::immediateSubmit(std::function<void(VkCommandBuffer cmd)>&& function) { void VkContext::immediateSubmit(std::function<void(VkCommandBuffer cmd)>&& function) {
if (inUploadBatch_) {
// Record into the batch command buffer — no submit, no fence wait
function(batchCmd_);
return;
}
VkCommandBuffer cmd = beginSingleTimeCommands(); VkCommandBuffer cmd = beginSingleTimeCommands();
function(cmd); function(cmd);
endSingleTimeCommands(cmd); endSingleTimeCommands(cmd);
} }
void VkContext::beginUploadBatch() {
uploadBatchDepth_++;
if (inUploadBatch_) return; // already in a batch (nested call)
inUploadBatch_ = true;
batchCmd_ = beginSingleTimeCommands();
}
void VkContext::endUploadBatch() {
if (uploadBatchDepth_ <= 0) return;
uploadBatchDepth_--;
if (uploadBatchDepth_ > 0) return; // still inside an outer batch
inUploadBatch_ = false;
if (batchStagingBuffers_.empty()) {
// No GPU copies were recorded — skip the submit entirely.
vkEndCommandBuffer(batchCmd_);
vkFreeCommandBuffers(device, immCommandPool, 1, &batchCmd_);
batchCmd_ = VK_NULL_HANDLE;
return;
}
// Submit commands with a NEW fence — don't wait, let GPU work in parallel.
vkEndCommandBuffer(batchCmd_);
VkFenceCreateInfo fenceInfo{};
fenceInfo.sType = VK_STRUCTURE_TYPE_FENCE_CREATE_INFO;
VkFence fence = VK_NULL_HANDLE;
vkCreateFence(device, &fenceInfo, nullptr, &fence);
VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &batchCmd_;
vkQueueSubmit(graphicsQueue, 1, &submitInfo, fence);
// Stash everything for later cleanup when fence signals
InFlightBatch batch;
batch.fence = fence;
batch.cmd = batchCmd_;
batch.stagingBuffers = std::move(batchStagingBuffers_);
inFlightBatches_.push_back(std::move(batch));
batchCmd_ = VK_NULL_HANDLE;
batchStagingBuffers_.clear();
}
void VkContext::endUploadBatchSync() {
if (uploadBatchDepth_ <= 0) return;
uploadBatchDepth_--;
if (uploadBatchDepth_ > 0) return;
inUploadBatch_ = false;
if (batchStagingBuffers_.empty()) {
vkEndCommandBuffer(batchCmd_);
vkFreeCommandBuffers(device, immCommandPool, 1, &batchCmd_);
batchCmd_ = VK_NULL_HANDLE;
return;
}
// Synchronous path for load screens — submit and wait
endSingleTimeCommands(batchCmd_);
batchCmd_ = VK_NULL_HANDLE;
for (auto& staging : batchStagingBuffers_) {
destroyBuffer(allocator, staging);
}
batchStagingBuffers_.clear();
}
void VkContext::pollUploadBatches() {
if (inFlightBatches_.empty()) return;
for (auto it = inFlightBatches_.begin(); it != inFlightBatches_.end(); ) {
VkResult result = vkGetFenceStatus(device, it->fence);
if (result == VK_SUCCESS) {
// GPU finished — free resources
for (auto& staging : it->stagingBuffers) {
destroyBuffer(allocator, staging);
}
vkFreeCommandBuffers(device, immCommandPool, 1, &it->cmd);
vkDestroyFence(device, it->fence, nullptr);
it = inFlightBatches_.erase(it);
} else {
++it;
}
}
}
void VkContext::waitAllUploads() {
for (auto& batch : inFlightBatches_) {
vkWaitForFences(device, 1, &batch.fence, VK_TRUE, UINT64_MAX);
for (auto& staging : batch.stagingBuffers) {
destroyBuffer(allocator, staging);
}
vkFreeCommandBuffers(device, immCommandPool, 1, &batch.cmd);
vkDestroyFence(device, batch.fence, nullptr);
}
inFlightBatches_.clear();
}
void VkContext::deferStagingCleanup(AllocatedBuffer staging) {
batchStagingBuffers_.push_back(staging);
}
} // namespace rendering } // namespace rendering
} // namespace wowee } // namespace wowee

View file

@ -96,7 +96,11 @@ bool VkTexture::upload(VkContext& ctx, const uint8_t* pixels, uint32_t width, ui
generateMipmaps(ctx, format, width, height); generateMipmaps(ctx, format, width, height);
} }
if (ctx.isInUploadBatch()) {
ctx.deferStagingCleanup(staging);
} else {
destroyBuffer(ctx.getAllocator(), staging); destroyBuffer(ctx.getAllocator(), staging);
}
return true; return true;
} }
@ -162,7 +166,11 @@ bool VkTexture::uploadMips(VkContext& ctx, const uint8_t* const* mipData,
VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT); VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT);
}); });
if (ctx.isInUploadBatch()) {
ctx.deferStagingCleanup(staging);
} else {
destroyBuffer(ctx.getAllocator(), staging); destroyBuffer(ctx.getAllocator(), staging);
}
return true; return true;
} }

View file

@ -198,8 +198,12 @@ AllocatedBuffer uploadBuffer(VkContext& ctx, const void* data, VkDeviceSize size
vkCmdCopyBuffer(cmd, staging.buffer, gpuBuffer.buffer, 1, &copyRegion); vkCmdCopyBuffer(cmd, staging.buffer, gpuBuffer.buffer, 1, &copyRegion);
}); });
// Destroy staging buffer // Destroy staging buffer (deferred if in batch mode)
if (ctx.isInUploadBatch()) {
ctx.deferStagingCleanup(staging);
} else {
destroyBuffer(ctx.getAllocator(), staging); destroyBuffer(ctx.getAllocator(), staging);
}
return gpuBuffer; return gpuBuffer;
} }

View file

@ -419,6 +419,10 @@ bool WMORenderer::loadModel(const pipeline::WMOModel& model, uint32_t id) {
core::Logger::getInstance().debug(" WMO bounds: min=(", model.boundingBoxMin.x, ", ", model.boundingBoxMin.y, ", ", model.boundingBoxMin.z, core::Logger::getInstance().debug(" WMO bounds: min=(", model.boundingBoxMin.x, ", ", model.boundingBoxMin.y, ", ", model.boundingBoxMin.z,
") max=(", model.boundingBoxMax.x, ", ", model.boundingBoxMax.y, ", ", model.boundingBoxMax.z, ")"); ") max=(", model.boundingBoxMax.x, ", ", model.boundingBoxMax.y, ", ", model.boundingBoxMax.z, ")");
// Batch all GPU uploads (textures, VBs, IBs) into a single command buffer
// submission with one fence wait, instead of one per upload.
vkCtx_->beginUploadBatch();
// Load textures for this model // Load textures for this model
core::Logger::getInstance().debug(" WMO has ", model.textures.size(), " texture paths, ", model.materials.size(), " materials"); core::Logger::getInstance().debug(" WMO has ", model.textures.size(), " texture paths, ", model.materials.size(), " materials");
if (assetManager && !model.textures.empty()) { if (assetManager && !model.textures.empty()) {
@ -720,6 +724,8 @@ bool WMORenderer::loadModel(const pipeline::WMOModel& model, uint32_t id) {
groupRes.allUntextured = !anyTextured && !groupRes.mergedBatches.empty(); groupRes.allUntextured = !anyTextured && !groupRes.mergedBatches.empty();
} }
vkCtx_->endUploadBatch();
// Copy portal data for visibility culling // Copy portal data for visibility culling
modelData.portalVertices = model.portalVertices; modelData.portalVertices = model.portalVertices;
for (const auto& portal : model.portals) { for (const auto& portal : model.portals) {
@ -2319,8 +2325,21 @@ VkTexture* WMORenderer::loadTexture(const std::string& path) {
const auto& attemptedCandidates = uniqueCandidates; const auto& attemptedCandidates = uniqueCandidates;
// Try loading all candidates until one succeeds // Try loading all candidates until one succeeds
// Check pre-decoded BLP cache first (populated by background worker threads)
pipeline::BLPImage blp; pipeline::BLPImage blp;
std::string resolvedKey; std::string resolvedKey;
if (predecodedBLPCache_) {
for (const auto& c : uniqueCandidates) {
auto pit = predecodedBLPCache_->find(c);
if (pit != predecodedBLPCache_->end()) {
blp = std::move(pit->second);
predecodedBLPCache_->erase(pit);
resolvedKey = c;
break;
}
}
}
if (!blp.isValid()) {
for (const auto& c : attemptedCandidates) { for (const auto& c : attemptedCandidates) {
blp = assetManager->loadTexture(c); blp = assetManager->loadTexture(c);
if (blp.isValid()) { if (blp.isValid()) {
@ -2328,6 +2347,7 @@ VkTexture* WMORenderer::loadTexture(const std::string& path) {
break; break;
} }
} }
}
if (!blp.isValid()) { if (!blp.isValid()) {
if (loggedTextureLoadFails_.insert(key).second) { if (loggedTextureLoadFails_.insert(key).second) {
core::Logger::getInstance().warning("WMO: Failed to load texture: ", path); core::Logger::getInstance().warning("WMO: Failed to load texture: ", path);
@ -2363,10 +2383,10 @@ VkTexture* WMORenderer::loadTexture(const std::string& path) {
texture->createSampler(vkCtx_->getDevice(), VK_FILTER_LINEAR, VK_FILTER_LINEAR, texture->createSampler(vkCtx_->getDevice(), VK_FILTER_LINEAR, VK_FILTER_LINEAR,
VK_SAMPLER_ADDRESS_MODE_REPEAT); VK_SAMPLER_ADDRESS_MODE_REPEAT);
// Generate normal+height map from diffuse pixels // Generate normal+height map from diffuse pixels (skip during streaming to avoid CPU stalls)
float nhVariance = 0.0f; float nhVariance = 0.0f;
std::unique_ptr<VkTexture> nhMap; std::unique_ptr<VkTexture> nhMap;
if (normalMappingEnabled_ || pomEnabled_) { if ((normalMappingEnabled_ || pomEnabled_) && !deferNormalMaps_) {
nhMap = generateNormalHeightMap(blp.data.data(), blp.width, blp.height, nhVariance); nhMap = generateNormalHeightMap(blp.data.data(), blp.width, blp.height, nhVariance);
if (nhMap) { if (nhMap) {
approxBytes *= 2; // account for normal map in budget approxBytes *= 2; // account for normal map in budget