Skip to content

tbc: store tx byte location in tx index, bump DB to v6#1052

Open
marcopeereboom wants to merge 1 commit into
marco/lazy-blockfrom
marco/tx-offset
Open

tbc: store tx byte location in tx index, bump DB to v6#1052
marcopeereboom wants to merge 1 commit into
marco/lazy-blockfrom
marco/tx-offset

Conversation

@marcopeereboom
Copy link
Copy Markdown
Contributor

Summary

Store tx byte location (TxLoc: offset + length within raw block) in the tx index 't' entry value, which was previously nil. This enables O(1) tx lookup by jumping directly to the tx's bytes in the raw block — no scanning, no SHA256 hashing, no full block deserialization.

Depends on #1051 (lazy block reader).

Problem

Every tx lookup via BlockHashByTxId requires a subsequent full block deserialization to find the tx. The 't' entry value was nil — wasted space that could carry the byte offset. CPU profile shows 60% in SHA256 hashing from FindTx scanning every tx in the block to find a match.

Solution

  • BlockHashByTxId signature changed to return (*chainhash.Hash, wire.TxLoc, error) — all callers updated
  • processTxs now calls block.TxLoc() and stores offset+length via NewTxMappingWithLoc
  • BlockTxUpdate uses stack-allocated reusable buffers instead of slicing loop variables (addresses potential data integrity issue documented in tbc: tx index intermittently loses entries during IBD #1050)
  • All consumers wired to use TxLoc when available, with legacy fallback for nil values:
    • TxById (RPC) — deserializes only the target tx
    • txOutFromOutPoint (UTXO unwind) — deserializes only the target tx
    • handleBlockHashByTxIdRequest (RPC) — hash only, ignores TxLoc
    • hemictl — hash only, ignores TxLoc
  • DB version 5 → 6, upgrade wipes tx index for rebuild with TxLoc values
  • Errors from block.TxLoc() logged at Errorf, falls back to nil values

Testing

  • TestDbUpgradeV6 — seeds v5 DB, runs upgrade, verifies index wiped and version bumped
  • TestTxLocRoundTrip — stores TxLoc, reads back, verifies offset+length match
  • TestTxLocOffsetCorrectness — stores raw block, uses offset to extract tx bytes, deserializes, verifies txid and output values match
  • TestTxLocLegacyNilValue — nil-value entry returns zero TxLoc gracefully

Impact

With TxLoc, each cache miss costs: 1 LevelDB read (tx index) + 1 LevelDB/cache read (raw block) + parse ~200 bytes of one tx. No full block deserialization. No SHA256 scanning. Eliminates the need for parallel lookup strategies.

Related

Files changed

  • database/tbcd/database.goBlockHashByTxId returns TxLoc, NewTxMappingWithLoc
  • database/tbcd/level/level.go — implementation, stack buffers, DB v6
  • database/tbcd/level/level_test.go — 4 new tests
  • database/tbcd/level/upgrade.gov6() wipes tx index
  • service/tbc/txindex.go — stores TxLoc
  • service/tbc/tbc.goTxById uses TxLoc
  • service/tbc/utxoindex.gotxOutFromOutPoint uses TxLoc
  • service/tbc/rpc.go — caller updated
  • service/tbc/cpfp_test.go — stub updated
  • service/tbc/tbc_test.go — version expectations updated
  • cmd/hemictl/hemictl.go — caller updated

@marcopeereboom marcopeereboom requested a review from a team as a code owner May 29, 2026 07:10
Store TxLoc (offset + length within raw block) in the t entry value
instead of nil. This allows callers to jump directly to a tx's bytes
in the raw block without scanning — O(1) instead of O(txs_in_block).

BlockHashByTxId now returns (*chainhash.Hash, wire.TxLoc, error).
All callers updated. No separate method needed — callers that only
need the hash use bh, _, err := BlockHashByTxId(...).

processTxs calls block.TxLoc() and stores the location via
NewTxMappingWithLoc. Errors from TxLoc() are logged at Errorf
and the indexer falls back to nil values (legacy format).

BlockTxUpdate uses stack-allocated reusable buffers instead of
slicing loop variables. The previous code sliced the range variable
and passed the slice to leveldb.Batch.Put. appendRec copies
immediately, but the interaction between range variable reuse,
map deletion, and GC is not guaranteed safe. Stack buffers are
zero-alloc and independent per iteration.

DB version 5 -> 6. Upgrade path wipes the transactions index for
rebuild with TxLoc values. The index is fully derived from block data.

Ref: #1050
return l.MetadataPut(ctx, versionKey, v)
}

func (l *ldb) v6(ctx context.Context) error {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this upgrade take?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants