Skip to content
This repository was archived by the owner on Mar 16, 2026. It is now read-only.
This repository was archived by the owner on Mar 16, 2026. It is now read-only.

NES database lookup #604

@yo1dog

Description

@yo1dog

Don't know much about NES in general, but I had a thought: AFAIK, dumping NES is difficult because it requires knowing PRG/CHR size beforehand. Hence the need for manually sourcing and inputting these values from places like http://nes.dnsabr.com. Cart dumper automates this by storing a database of hashes of globally known seekable sections of the PRG ROM. However, this does not always work because there can be conflicts when these sections are identical between different games.

My mind immediately jumps to a b-tree index type strategy: Progressively scan and hash the PRG and CHR ROMs using the current match set to increase the seekable area and narrow down possibilities.

Let's pretend we have a database of the entire NES library which contains hashes for the first n bytes of each PRG ROM:

name PRG ROM size 16k CRC 128k CRC 512k CRC
Mario 16k 9f115a9e - -
Zelda 128k a3b3e36e 53b88a7a -
Contra 128k a3b3e36e b0c8c11e -
Kirby 512k a3b3e36e 2864f7cc 4f3ad289
Tetris 512k a3b3e36e 2864f7cc d2dce641
Gradius 512k dd611655 f5525447 707c2b2f
Frogger 512k 4bf93c55 3b2dc183 12c70afd
PacMan 512k 4bf93c55 3b2dc183 8c7869e6

Now we are dumping a cart. We know the minimum size is 16k so it is safe to read and hash the first 16k. Doing so produces a3b3e36e. This matches Zelda, Contra, Kirby, and Tetris. Of the 4, the smallest size is 128k, so we continue reading and hashing to 128k. Now we produce 2864f7cc which matches Kirby and Tetris. Both games are 512k, so we continue reading and hashing to 512k and produce d2dce641 which matches Tetris.

Theoretically this would resolve all ambiguity except for (very rare) cases in which both the PRG ROM and CHR ROM begin with the entirety of another PRG ROM and CHR ROM.

The database could be minimized drastically to only hashes required to resolve conflicts. For example, there is no reason to store the 128k and 512k hashes for Gradius because its 16k hash is unique. Same with the 128k hashes for Frogger and PacMan as they provide no disambiguation. In fact, rather than storing a flat lookup table, you could store an index-tree-like structure instead that contained disambiguation instructions:

Read 16k
├ 9f115a9e: Mario
├ a3b3e36e: Read 128k
│ ├ 53b88a7a: Zelda
│ ├ b0c8c11e: Contra
│ └ 2864f7cc: Read 512k
│   ├ 4f3ad289: Kirby
│   └ d2dce641: Tetris
├ dd611655: Gradius
└ 4bf93c55: Read 512k
  ├ 12c70afd: Frogger
  └ 8c7869e6: PacMan

I tested this theory on a headerless no-intro ROM set using NES2.0 DB. It was able to index and distinguish all but 21 of the 3,560 ROMs. The Virtual Console and cassette dumps can be excluded which brings the number down to 9, only 3 of which are "standard" games:

9FFE2F55 PRG:65536 CHR:131072
  ├─ 9FFE2F55 Sky Shark (USA) - PRG:65536 CHR:131072
  └─ 4AF742FA Sky Shark (USA) (Rev 1) - PRG:131072 CHR:131072
 E41220D8 PRG:262144 CHR:0
  ├─ E41220D8 Assimilate (USA) (RetroUSB) (Aftermarket) (Homebrew) - PRG:262144 CHR:0
  └─ 7145F667 Assimilate (USA) (RetroUSB) (Aftermarket) (Homebrew) (Alt) - PRG:524288 CHR:0
 CD8233EF PRG:16384 CHR:8192
  ├─ 2F55BE88 Lunar Ball (Japan) - PRG:16384 CHR:8192
  ├─ 80CBCACB Golden Game 100-in-1 (Asia) (En) (Pirate) - PRG:1048576 CHR:0
  ├─ 6175B9A0 Golden Game 150-in-1 (Asia) (En) (Pirate) - PRG:2097152 CHR:0
  ├─ 46A1AE7B Golden Game 210-in-1 (Asia) (En) (Pirate) - PRG:2097152 CHR:0
  └─ 4E5668A9 Golden Game 260-in-1 (Asia) (En) (Pirate) - PRG:3145728 CHR:0
  
20F98977 PRG:16384 CHR:16384
  ├─ 20F98977 City Connection (Japan) - PRG:16384 CHR:16384
  └─ D20775DA City Connection (Japan) (Virtual Console, Switch Online) - PRG:32768 CHR:16384
0F05FF0A PRG:32768 CHR:8192
  ├─ 0F05FF0A Seicross (Japan) (Rev 1) - PRG:32768 CHR:8192
  └─ 3413E33B Seicross (Japan) (Virtual Console) - PRG:32768 CHR:16384
E37A39AB PRG:131072 CHR:65536
  ├─ E37A39AB Yoshi's Cookie (Europe) - PRG:131072 CHR:65536
  └─ CAA76927 Yoshi's Cookie (Europe) (Virtual Console) - PRG:131072 CHR:131072
A2623BC1 PRG:131072 CHR:131072
  ├─ A2623BC1 Nantettatte!! Baseball (Japan) - PRG:131072 CHR:131072
  ├─ 6C039D11 Nantettatte!! Baseball + Nantettatte!! Baseball - Ko-Game Cassette - '91 Kaimaku Hen (Japan) - PRG:147456 CHR:131072
  └─ A5275B36 Nantettatte!! Baseball + Nantettatte!! Baseball - Ko-Game Cassette - OB All Star Hen (Japan) - PRG:147456 CHR:131072
ADFAD6B6 PRG:131072 CHR:0
  ├─ ADFAD6B6 Karaoke Studio (Japan) - PRG:131072 CHR:0
  ├─ 4B6EF399 Karaoke Studio Senyou Cassette - Top Hit 20 Vol. 1 (Japan) - PRG:262144 CHR:0
  └─ 50F3E338 Karaoke Studio Senyou Cassette - Top Hit 20 Vol. 2 (Japan) - PRG:262144 CHR:0

All of these are instances in which the original/parent ROM is included in its entirety at the start of the child ROM.

I attached the generated index in JSON. Right now it's just a map of partial CRC32 to full CRC32, but it could instead map to game name, PRG ROM size, mapper, etc.

nesIndex2.json.txt

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions