tools/game: fix dbc_to_csv false-positive string detection + clear DBC cache on expansion switch

dbc_to_csv: The string-column auto-detector would mark integer fields (e.g.
RaceID=1, SexID=0, BaseSection=0-4) as string columns whenever their small
values were valid string-block offsets that happened to land inside longer
strings.  Fix by requiring that an offset point to a string *boundary* (offset
0 or immediately after a null byte) rather than any valid position — this
eliminates false positives from integer fields whose values accidentally alias
path substrings.  Affected CSVs (CharSections, ItemDisplayInfo for Classic/TBC)
can now be regenerated correctly.

game_handler: clearDBCCache() is already called by application.cpp before
resetDbcCaches(), but also add it inside resetDbcCaches() as a defensive
measure so that future callers of resetDbcCaches() alone also flush stale
expansion-specific DBC data (CharSections, ItemDisplayInfo, etc.).
This commit is contained in:
Kelsi 2026-03-10 03:27:30 -07:00
parent 29ca9809b1
commit 4a213d8da8
2 changed files with 36 additions and 3 deletions

View file

@ -41,9 +41,31 @@ std::vector<uint8_t> readFileBytes(const std::string& path) {
return buf;
}
// Check whether offset points to a plausible string in the string block.
bool isValidStringOffset(const std::vector<uint8_t>& stringBlock, uint32_t offset) {
// Precompute the set of valid string-boundary offsets in the string block.
// An offset is a valid boundary if it is 0 or immediately follows a null byte.
// This prevents small integer values (e.g. RaceID=1, 2, 3) from being falsely
// detected as string offsets just because they land in the middle of a longer
// string that starts at a lower offset.
std::set<uint32_t> computeStringBoundaries(const std::vector<uint8_t>& stringBlock) {
std::set<uint32_t> boundaries;
if (stringBlock.empty()) return boundaries;
boundaries.insert(0); // offset 0 is always a valid start
for (size_t i = 0; i + 1 < stringBlock.size(); ++i) {
if (stringBlock[i] == 0) {
boundaries.insert(static_cast<uint32_t>(i + 1));
}
}
return boundaries;
}
// Check whether offset points to a valid string-boundary position in the block
// and that the string there is printable and null-terminated.
bool isValidStringOffset(const std::vector<uint8_t>& stringBlock,
const std::set<uint32_t>& boundaries,
uint32_t offset) {
if (offset >= stringBlock.size()) return false;
// Must start at a string boundary (offset 0 or right after a null byte).
if (!boundaries.count(offset)) return false;
// Must be null-terminated within the block and contain only printable/whitespace bytes.
for (size_t i = offset; i < stringBlock.size(); ++i) {
uint8_t c = stringBlock[i];
@ -75,6 +97,10 @@ std::set<uint32_t> detectStringColumns(const DBCFile& dbc,
// If no string block (or trivial size), no string columns.
if (stringBlock.size() <= 1) return stringCols;
// Precompute valid string-start boundaries to avoid false positives from
// integer fields whose small values accidentally land inside longer strings.
auto boundaries = computeStringBoundaries(stringBlock);
for (uint32_t col = 0; col < fieldCount; ++col) {
bool allZeroOrValid = true;
bool hasNonZero = false;
@ -83,7 +109,7 @@ std::set<uint32_t> detectStringColumns(const DBCFile& dbc,
uint32_t val = dbc.getUInt32(row, col);
if (val == 0) continue;
hasNonZero = true;
if (!isValidStringOffset(stringBlock, val)) {
if (!isValidStringOffset(stringBlock, boundaries, val)) {
allZeroOrValid = false;
break;
}