Merge pull request #54 from ldmonster/fix/memory-pressure-and-hardening

[fix] Memory, Threading & Network Hardening
This commit is contained in:
Kelsi Rae Davis 2026-04-06 13:45:31 -07:00 committed by GitHub
commit 20e016798f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
13 changed files with 333 additions and 25 deletions

156
docs/threading.md Normal file
View file

@ -0,0 +1,156 @@
# Threading Model
This document describes the threading architecture of WoWee, the synchronisation
primitives that protect shared state, and the conventions that new code must
follow.
---
## Thread Inventory
| # | Name / Role | Created At | Lifetime |
|----|------------------------|-------------------------------------------------|-------------------------------|
| 1 | **Main thread** | `Application::run()` (`main.cpp`) | Entire session |
| 2 | **Async network pump** | `WorldSocket::connectAsync()` (`world_socket.cpp`) | Connect → disconnect |
| 3 | **Terrain workers** | `TerrainManager::startWorkers()` (`terrain_manager.cpp`) | Map load → map unload |
| 4 | **Watchdog** | `Application::startWatchdog()` (`application.cpp`) | After first frame → shutdown |
| 5 | **Fire-and-forget** | `std::async` / `std::thread(...).detach()` (various) | Task-scoped (bone anim, normal-map gen, warden crypto, world preload, entity model loading) |
### Thread Responsibilities
* **Main thread** — SDL event pumping, game logic (entity update, camera, UI),
GPU resource upload/finalization, render command recording, Vulkan present.
* **Network pump**`recv()` loop, header decryption, packet parsing. Pushes
parsed packets into `pendingPacketCallbacks_` (locked by `callbackMutex_`).
The main thread drains this queue via `dispatchQueuedPackets()`.
* **Terrain workers** — background ADT/WMO/M2 file I/O, mesh decoding, texture
decompression. Workers push completed `PendingTile` objects into `readyQueue`
(locked by `queueMutex`). The main thread finalizes (GPU upload) via
`processReadyTiles()`.
* **Watchdog** — periodic frame-stall detection. Reads `watchdogHeartbeatMs`
(atomic) and optionally requests a Vulkan device reset via
`watchdogRequestRelease` (atomic).
* **Fire-and-forget** — short-lived tasks. Each captures only the data it
needs or uses a dedicated result channel (e.g. `std::future`,
`completedNormalMaps_` with `normalMapResultsMutex_`).
---
## Shared State Map
### Legend
| Annotation | Meaning |
|-------------------------|---------|
| `THREAD-SAFE: <mutex>` | Protected by the named mutex/atomic. |
| `MAIN-THREAD-ONLY` | Accessed exclusively by the main thread. No lock needed. |
### Asset Manager (`include/pipeline/asset_manager.hpp`)
| Variable | Guard | Notes |
|-------------------------|------------------|-------|
| `fileCache` | `cacheMutex` (shared_mutex) | `shared_lock` for reads, `lock_guard` for writes/eviction |
| `dbcCache` | `cacheMutex` | Same mutex as fileCache |
| `fileCacheTotalBytes` | `cacheMutex` | Written under exclusive lock only |
| `fileCacheAccessCounter`| `cacheMutex` | Written under exclusive lock only |
| `fileCacheHits` | `std::atomic` | Incremented after releasing cacheMutex |
| `fileCacheMisses` | `std::atomic` | Incremented after releasing cacheMutex |
### Audio Engine (`src/audio/audio_engine.cpp`)
| Variable | Guard | Notes |
|------------------------|---------------------------|-------|
| `gDecodedWavCache` | `gDecodedWavCacheMutex` (shared_mutex) | `shared_lock` for cache hits, `lock_guard` for miss+eviction. Double-check after decoding. |
### World Socket (`include/network/world_socket.hpp`)
| Variable | Guard | Notes |
|---------------------------|------------------|-------|
| `sockfd`, `connected`, `encryptionEnabled`, `receiveBuffer`, `receiveReadOffset_`, `headerBytesDecrypted`, cipher state, `recentPacketHistory_` | `ioMutex_` | Consistent `lock_guard` in `send()` and `pumpNetworkIO()` |
| `pendingPacketCallbacks_` | `callbackMutex_` | Pump thread produces, main thread consumes in `dispatchQueuedPackets()` |
| `asyncPumpStop_`, `asyncPumpRunning_` | `std::atomic<bool>` | Memory-order acquire/release |
| `packetCallback` | *implicit* | Set once before `connectAsync()` starts the pump thread |
### Terrain Manager (`include/rendering/terrain_manager.hpp`)
| Variable | Guard | Notes |
|-----------------------|--------------------------|-------|
| `loadQueue`, `readyQueue`, `pendingTiles` | `queueMutex` + `queueCV` | Workers wait; main signals on enqueue/finalize |
| `tileCache_`, `tileCacheLru_`, `tileCacheBytes_` | `tileCacheMutex_` | Read/write by both main and workers |
| `uploadedM2Ids_` | `uploadedM2IdsMutex_` | Workers check, main inserts on finalize |
| `preparedWmoUniqueIds_`| `preparedWmoUniqueIdsMutex_` | Workers only |
| `missingAdtWarnings_` | `missingAdtWarningsMutex_` | Workers only |
| `workerRunning` | `std::atomic<bool>` | — |
| `placedDoodadIds`, `placedWmoIds`, `loadedTiles`, `failedTiles` | MAIN-THREAD-ONLY | Only touched in processReadyTiles / unloadDistantTiles |
### Entity Manager (`include/game/entity.hpp`)
| Variable | Guard | Notes |
|------------|------------------|-------|
| `entities` | MAIN-THREAD-ONLY | All mutations via `dispatchQueuedPackets()` on main thread |
### Character Renderer (`include/rendering/character_renderer.hpp`)
| Variable | Guard | Notes |
|------------------------|---------------------------|-------|
| `completedNormalMaps_` | `normalMapResultsMutex_` | Detached threads push, main thread drains |
| `pendingNormalMapCount_`| `std::atomic<int>` | acq_rel ordering |
### Logger (`include/core/logger.hpp`)
| Variable | Guard | Notes |
|-------------|-----------------|-------|
| `minLevel_` | `std::atomic<int>` | Fast path check in `shouldLog()` |
| `fileStream`, `lastMessage_`, suppression state | `mutex` | Locked in `log()` |
### Application (`src/core/application.cpp`)
| Variable | Guard | Notes |
|-------------------------|--------------------|-------|
| `watchdogHeartbeatMs` | `std::atomic<int64_t>` | Main stores, watchdog loads |
| `watchdogRequestRelease`| `std::atomic<bool>` | Watchdog stores, main exchanges |
| `watchdogRunning` | `std::atomic<bool>` | — |
---
## Conventions for New Code
1. **Prefer `std::shared_mutex`** for read-heavy caches. Use `std::shared_lock`
for lookups and `std::lock_guard<std::shared_mutex>` for mutations.
2. **Annotate shared state** at the declaration site with either
`// THREAD-SAFE: protected by <mutex_name>` or `// MAIN-THREAD-ONLY`.
3. **Keep lock scope minimal.** Copy data under the lock, then process outside.
4. **Avoid detaching threads** when possible. Prefer `std::async` with a
`std::future` stored on the owning object so shutdown can wait for completion.
5. **Use `std::atomic` for counters and flags** that are read/written without
other invariants (e.g. cache hit stats, boolean run flags).
6. **No lock-order inversions.** Current order (most-outer first):
`ioMutex_``callbackMutex_``queueMutex``cacheMutex`.
7. **ThreadSanitizer** — run periodically with `-fsanitize=thread` to catch
regressions:
```bash
cmake -DCMAKE_CXX_FLAGS="-fsanitize=thread" .. && make -j$(nproc)
```
---
## Known Limitations
* `EntityManager::entities` relies on the convention that all entity mutations
happen on the main thread through `dispatchQueuedPackets()`. There is no
compile-time enforcement. If a future change introduces direct entity
modification from the network pump thread, a mutex must be added.
* `packetCallback` in `WorldSocket` is set once before `connectAsync()` and
never modified afterwards. This is safe in practice but not formally
synchronized — do not change the callback after `connectAsync()`.
* `fileCacheMisses` is declared as `std::atomic<size_t>` for consistency but is
currently never incremented; the actual miss count must be inferred from
`fileCacheAccessCounter - fileCacheHits`.

View file

@ -48,6 +48,10 @@ public:
// Get session key (K) - used for encryption
std::vector<uint8_t> getSessionKey() const;
// Securely erase stored plaintext credentials from memory.
// Called automatically at the end of feed() once the SRP values are computed.
void clearCredentials();
private:
// WoW-specific SRP multiplier (k = 3)
static constexpr uint32_t K_VALUE = 3;

View file

@ -34,10 +34,17 @@ public:
size_t getRecommendedCacheBudget() const;
/**
* Check if system is under memory pressure
* Check if system is under memory pressure (< 10% RAM available)
*/
bool isMemoryPressure() const;
/**
* Check if system is under severe memory pressure (< 15% RAM available).
* At this level, background loading should pause entirely until memory
* is freed continuing to allocate risks OOM-killing other applications.
*/
bool isSevereMemoryPressure() const;
private:
MemoryMonitor() = default;
size_t totalRAM_ = 0;

View file

@ -361,6 +361,8 @@ public:
}
private:
// MAIN-THREAD-ONLY: all entity map mutations happen via dispatchQueuedPackets()
// which runs on the main thread. Do NOT access from the async network pump thread.
std::unordered_map<uint64_t, std::shared_ptr<Entity>> entities;
};

View file

@ -94,15 +94,18 @@ private:
void recordRecentPacket(bool outbound, uint16_t opcode, uint16_t payloadLen);
void dumpRecentPacketHistoryLocked(const char* reason, size_t bufferedBytes);
socket_t sockfd = INVALID_SOCK;
bool connected = false;
bool encryptionEnabled = false;
socket_t sockfd = INVALID_SOCK; // THREAD-SAFE: protected by ioMutex_
bool connected = false; // THREAD-SAFE: protected by ioMutex_
bool encryptionEnabled = false; // THREAD-SAFE: protected by ioMutex_
bool useVanillaCrypt = false; // true = XOR cipher, false = RC4
bool useAsyncPump_ = true;
std::thread asyncPumpThread_;
std::atomic<bool> asyncPumpStop_{false};
std::atomic<bool> asyncPumpRunning_{false};
std::atomic<bool> asyncPumpStop_{false}; // THREAD-SAFE: atomic
std::atomic<bool> asyncPumpRunning_{false}; // THREAD-SAFE: atomic
// Guards sockfd, connected, encryptionEnabled, receiveBuffer, cipher state,
// headerBytesDecrypted, and recentPacketHistory_.
mutable std::mutex ioMutex_;
// Guards pendingPacketCallbacks_ (asyncPumpThread_ produces, main thread consumes).
mutable std::mutex callbackMutex_;
// WotLK RC4 ciphers for header encryption/decryption
@ -112,11 +115,12 @@ private:
// Vanilla/TBC XOR+addition cipher
auth::VanillaCrypt vanillaCrypt;
// Receive buffer
// THREAD-SAFE: protected by ioMutex_
std::vector<uint8_t> receiveBuffer;
size_t receiveReadOffset_ = 0;
// Optional reused packet queue (feature-gated) to reduce per-update allocations.
std::vector<Packet> parsedPacketsScratch_;
// THREAD-SAFE: protected by callbackMutex_.
// Parsed packets waiting for callback dispatch; drained with a strict per-update budget.
std::deque<Packet> pendingPacketCallbacks_;

View file

@ -4,6 +4,7 @@
#include "pipeline/dbc_loader.hpp"
#include "pipeline/asset_manifest.hpp"
#include "pipeline/loose_file_reader.hpp"
#include <atomic>
#include <memory>
#include <string>
#include <vector>
@ -166,7 +167,11 @@ private:
*/
std::string resolveFile(const std::string& normalizedPath) const;
// Guards fileCache, dbcCache, fileCacheTotalBytes, fileCacheAccessCounter, and
// fileCacheBudget. Shared lock for read-only cache lookups (readFile cache hit,
// loadDBC cache hit); exclusive lock for inserts and eviction.
mutable std::shared_mutex cacheMutex;
// THREAD-SAFE: protected by cacheMutex (exclusive lock for writes).
std::unordered_map<std::string, std::shared_ptr<DBCFile>> dbcCache;
// File cache (LRU, dynamic budget based on system RAM)
@ -174,11 +179,14 @@ private:
std::vector<uint8_t> data;
uint64_t lastAccessTime;
};
// THREAD-SAFE: protected by cacheMutex (shared_mutex — shared_lock for reads,
// exclusive lock_guard for writes/eviction).
mutable std::unordered_map<std::string, CachedFile> fileCache;
mutable size_t fileCacheTotalBytes = 0;
mutable uint64_t fileCacheAccessCounter = 0;
mutable size_t fileCacheHits = 0;
mutable size_t fileCacheMisses = 0;
// THREAD-SAFE: atomic — incremented from any thread after releasing cacheMutex.
mutable std::atomic<size_t> fileCacheHits{0};
mutable std::atomic<size_t> fileCacheMisses{0};
mutable size_t fileCacheBudget = 1024 * 1024 * 1024; // Dynamic, starts at 1GB
void setupFileCacheBudget();

View file

@ -362,10 +362,18 @@ private:
// Background loading worker pool
std::vector<std::thread> workerThreads;
int workerCount = 0;
// THREAD-SAFE: guards loadQueue, readyQueue, and pendingTiles.
// Workers wait on queueCV; main thread signals when new tiles are enqueued
// or when readyQueue drains below maxReadyQueueSize_.
std::mutex queueMutex;
std::condition_variable queueCV;
std::deque<TileCoord> loadQueue;
std::queue<std::shared_ptr<PendingTile>> readyQueue;
std::deque<TileCoord> loadQueue; // THREAD-SAFE: protected by queueMutex
std::queue<std::shared_ptr<PendingTile>> readyQueue; // THREAD-SAFE: protected by queueMutex
// Maximum number of prepared-but-not-finalized tiles in readyQueue.
// Each prepared tile can hold 100500 MB of decoded textures in RAM.
// Workers sleep when this limit is reached, letting the main thread
// finalize (GPU-upload + free) before more tiles are prepared.
static constexpr size_t maxReadyQueueSize_ = 3;
// In-RAM tile cache (LRU) to avoid re-reading from disk
struct CachedTile {
@ -373,6 +381,7 @@ private:
size_t bytes = 0;
std::list<TileCoord>::iterator lruIt;
};
// THREAD-SAFE: protected by tileCacheMutex_.
std::unordered_map<TileCoord, CachedTile, TileCoord::Hash> tileCache_;
std::list<TileCoord> tileCacheLru_;
size_t tileCacheBytes_ = 0;
@ -386,8 +395,8 @@ private:
std::atomic<bool> workerRunning{false};
// Track tiles currently queued or being processed to avoid duplicates
std::unordered_map<TileCoord, bool, TileCoord::Hash> pendingTiles;
std::unordered_set<std::string> missingAdtWarnings_;
std::unordered_map<TileCoord, bool, TileCoord::Hash> pendingTiles; // THREAD-SAFE: protected by queueMutex
std::unordered_set<std::string> missingAdtWarnings_; // THREAD-SAFE: protected by missingAdtWarningsMutex_
std::mutex missingAdtWarningsMutex_;
// Thread-safe set of M2 model IDs already uploaded to GPU
@ -400,10 +409,11 @@ private:
std::unordered_set<uint32_t> preparedWmoUniqueIds_;
std::mutex preparedWmoUniqueIdsMutex_;
// Dedup set for doodad placements across tile boundaries
// MAIN-THREAD-ONLY: checked and modified in processReadyTiles() and unloadDistantTiles(),
// both of which run exclusively on the main thread.
std::unordered_set<uint32_t> placedDoodadIds;
// Dedup set for WMO placements across tile boundaries (prevents rendering Stormwind 16x)
// MAIN-THREAD-ONLY: same contract as placedDoodadIds.
std::unordered_set<uint32_t> placedWmoIds;
// Tiles currently being incrementally finalized across frames

View file

@ -10,6 +10,7 @@
#include <cstdlib>
#include <iterator>
#include <memory>
#include <shared_mutex>
#include <unordered_map>
namespace wowee {
@ -26,6 +27,10 @@ struct DecodedWavCacheEntry {
};
static std::unordered_map<uint64_t, DecodedWavCacheEntry> gDecodedWavCache;
// Protects gDecodedWavCache — shared_lock for reads, unique_lock for writes.
// Required because playSound2D() can be called from multiple threads
// (main thread, async loaders, animation callbacks).
static std::shared_mutex gDecodedWavCacheMutex;
static uint64_t makeWavCacheKey(const std::vector<uint8_t>& wavData) {
// FNV-1a over the first 256 bytes + last 256 bytes + total size.
@ -53,9 +58,14 @@ static bool decodeWavCached(const std::vector<uint8_t>& wavData, DecodedWavCache
if (wavData.empty()) return false;
const uint64_t key = makeWavCacheKey(wavData);
if (auto it = gDecodedWavCache.find(key); it != gDecodedWavCache.end()) {
out = it->second;
return true;
// Fast path: shared (read) lock for cache hits — allows concurrent lookups.
{
std::shared_lock<std::shared_mutex> readLock(gDecodedWavCacheMutex);
if (auto it = gDecodedWavCache.find(key); it != gDecodedWavCache.end()) {
out = it->second;
return true;
}
}
ma_decoder decoder;
@ -102,13 +112,22 @@ static bool decodeWavCached(const std::vector<uint8_t>& wavData, DecodedWavCache
// Evict oldest half when cache grows too large. 256 entries ≈ 50-100 MB of decoded
// PCM data depending on file lengths; halving keeps memory bounded while retaining
// recently-heard sounds (footsteps, UI clicks, combat hits) for instant replay.
constexpr size_t kMaxCachedSounds = 256;
if (gDecodedWavCache.size() >= kMaxCachedSounds) {
auto it = gDecodedWavCache.begin();
std::advance(it, gDecodedWavCache.size() / 2);
gDecodedWavCache.erase(gDecodedWavCache.begin(), it);
// Exclusive (write) lock — only one thread can evict + insert.
{
std::lock_guard<std::shared_mutex> writeLock(gDecodedWavCacheMutex);
// Re-check in case another thread inserted while we were decoding.
if (auto it = gDecodedWavCache.find(key); it != gDecodedWavCache.end()) {
out = it->second;
return true;
}
constexpr size_t kMaxCachedSounds = 256;
if (gDecodedWavCache.size() >= kMaxCachedSounds) {
auto it = gDecodedWavCache.begin();
std::advance(it, gDecodedWavCache.size() / 2);
gDecodedWavCache.erase(gDecodedWavCache.begin(), it);
}
gDecodedWavCache.emplace(key, entry);
}
gDecodedWavCache.emplace(key, entry);
out = entry;
return true;
}

View file

@ -68,6 +68,26 @@ void AuthHandler::disconnect() {
socket->disconnect();
socket.reset();
}
// Scrub sensitive material when tearing down the auth session.
if (!password.empty()) {
volatile char* p = const_cast<volatile char*>(password.data());
for (size_t i = 0; i < password.size(); ++i)
p[i] = '\0';
password.clear();
password.shrink_to_fit();
}
if (!sessionKey.empty()) {
volatile uint8_t* k = const_cast<volatile uint8_t*>(sessionKey.data());
for (size_t i = 0; i < sessionKey.size(); ++i)
k[i] = 0;
sessionKey.clear();
sessionKey.shrink_to_fit();
}
if (srp) {
srp->clearCredentials();
}
setState(AuthState::DISCONNECTED);
LOG_INFO("Disconnected from auth server");
}
@ -354,6 +374,16 @@ void AuthHandler::handleLogonProofResponse(network::Packet& packet) {
sessionKey = srp->getSessionKey();
setState(AuthState::AUTHENTICATED);
// Plaintext password is no longer needed — zero-fill and release it so it
// doesn't sit in process memory for the rest of the session.
if (!password.empty()) {
volatile char* p = const_cast<volatile char*>(password.data());
for (size_t i = 0; i < password.size(); ++i)
p[i] = '\0';
password.clear();
password.shrink_to_fit();
}
LOG_INFO("========================================");
LOG_INFO(" AUTHENTICATION SUCCESSFUL!");
LOG_INFO("========================================");

View file

@ -96,6 +96,10 @@ void SRP::feed(const std::vector<uint8_t>& B_bytes,
// 5. Compute proofs (M1, M2)
computeProofs(stored_username);
// Credentials are no longer needed — zero and release them so they don't
// linger in process memory longer than necessary.
clearCredentials();
// Log key values for debugging auth issues
auto hexStr = [](const std::vector<uint8_t>& v, size_t maxBytes = 8) -> std::string {
std::ostringstream ss;
@ -314,5 +318,26 @@ std::vector<uint8_t> SRP::getSessionKey() const {
return K;
}
void SRP::clearCredentials() {
// Overwrite plaintext password bytes before releasing storage so that a
// heap dump / core file doesn't leak the user's credentials. This is
// not a guarantee against a privileged attacker with live memory access,
// but it removes the most common exposure vector.
if (!stored_password.empty()) {
volatile char* p = const_cast<volatile char*>(stored_password.data());
for (size_t i = 0; i < stored_password.size(); ++i)
p[i] = '\0';
stored_password.clear();
stored_password.shrink_to_fit();
}
if (!stored_auth_hash.empty()) {
volatile uint8_t* h = const_cast<volatile uint8_t*>(stored_auth_hash.data());
for (size_t i = 0; i < stored_auth_hash.size(); ++i)
h[i] = 0;
stored_auth_hash.clear();
stored_auth_hash.shrink_to_fit();
}
}
} // namespace auth
} // namespace wowee

View file

@ -126,5 +126,12 @@ bool MemoryMonitor::isMemoryPressure() const {
return available < (totalRAM_ * 10 / 100);
}
bool MemoryMonitor::isSevereMemoryPressure() const {
size_t available = getAvailableRAM();
// Severe pressure if < 15% RAM available — background workers should
// pause entirely to avoid OOM-killing other applications.
return available < (totalRAM_ * 15 / 100);
}
} // namespace core
} // namespace wowee

View file

@ -571,7 +571,21 @@ void WorldSocket::pumpNetworkIO() {
}
receiveBuffer.insert(receiveBuffer.end(), buffer, buffer + receivedSize);
} else {
receiveBuffer.insert(receiveBuffer.end(), buffer, buffer + received);
// Non-fast path: same overflow pre-check as fast path to prevent
// unbounded buffer growth before the post-check below.
size_t liveBytes = bufferedBytes();
if (liveBytes > kMaxReceiveBufferBytes || receivedSize > (kMaxReceiveBufferBytes - liveBytes)) {
compactReceiveBuffer();
liveBytes = bufferedBytes();
}
if (liveBytes > kMaxReceiveBufferBytes || receivedSize > (kMaxReceiveBufferBytes - liveBytes)) {
LOG_ERROR("World socket receive buffer would overflow (buffered=", liveBytes,
" incoming=", receivedSize, " max=", kMaxReceiveBufferBytes,
"). Disconnecting to recover framing.");
closeSocketNoJoin();
return;
}
receiveBuffer.insert(receiveBuffer.end(), buffer, buffer + receivedSize);
}
if (bufferedBytes() > kMaxReceiveBufferBytes) {
LOG_ERROR("World socket receive buffer overflow (", bufferedBytes(),

View file

@ -1212,6 +1212,28 @@ void TerrainManager::workerLoop() {
break;
}
// --- Memory-aware throttling ---
// Back-pressure: if the ready queue is deep (finalization can't
// keep up), or the system is running low on RAM, sleep instead
// of pulling more tiles. Each prepared tile can hold hundreds
// of MB of decoded textures; limiting concurrency here prevents
// WoWee from consuming all system memory during world load.
const auto& memMon = core::MemoryMonitor::getInstance();
if (memMon.isSevereMemoryPressure()) {
// Severe pressure — don't pull ANY work until main thread
// finalizes tiles and frees decoded texture data.
lock.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
continue;
}
if (readyQueue.size() >= maxReadyQueueSize_ || memMon.isMemoryPressure()) {
// Moderate pressure or ready queue is backing up — sleep briefly
// to let the main thread catch up with finalization.
lock.unlock();
std::this_thread::sleep_for(std::chrono::milliseconds(50));
continue;
}
if (!loadQueue.empty()) {
coord = loadQueue.front();
loadQueue.pop_front();