Skip to content

tbc: add lazy block reader for zero-copy per-tx access#1051

Open
marcopeereboom wants to merge 1 commit into
mainfrom
marco/lazy-block
Open

tbc: add lazy block reader for zero-copy per-tx access#1051
marcopeereboom wants to merge 1 commit into
mainfrom
marco/lazy-block

Conversation

@marcopeereboom
Copy link
Copy Markdown
Contributor

Summary

Add a lazyBlock type that wraps raw block bytes from the block cache with lazy per-tx access. Nothing is parsed until explicitly requested — individual tx access parses only that tx from its raw byte range.

Problem

BlockByHash calls btcutil.NewBlockFromBytes on every read, which eagerly deserializes the entire block into heap objects (MsgBlock, every MsgTx, every TxIn/TxOut, witness slices). For a 3 MB block with 5000 txs, callers that need one tx pay for all 5000. CPU profile shows 50%+ GC pressure from discarded deserialization objects.

Solution

  • BlockRawByHash added to DB interface — returns raw []byte from block cache/LevelDB without parsing
  • lazyBlock type in service/tbc/lazyblock.go:
    • Hash() — SHA256 of first 80 bytes, computed on demand
    • TxCount() / TxHash(i) — scan tx boundaries from raw bytes, compute txid via SHA256 (handles segwit witness flag correctly)
    • FindTx(txid) — iterate TxHash(i) until match
    • TxOutputValues(i) — parse only one tx's output values from raw bytes
    • FullBlock() — fallback to btcutil.NewBlockFromBytes

No btcutil fork. External to btcutil entirely. Existing BlockByHash callers unchanged.

Testing

  • 1217 lines of tests covering: boundary scanning, txid computation (legacy + segwit + witness), output value extraction, FindTx, edge cases (empty blocks, coinbase-only), full-block fallback
  • Tested against known mainnet and testnet blocks

Files changed

  • database/tbcd/database.go — add BlockRawByHash to interface
  • database/tbcd/level/level.go — implement BlockRawByHash
  • service/tbc/lazyblock.go — new file
  • service/tbc/lazyblock_test.go — new file
  • service/tbc/cpfp_test.go — stub updated

@marcopeereboom marcopeereboom requested a review from a team as a code owner May 29, 2026 07:09
@github-actions github-actions Bot added area: tbc This is a change to TBC (Tiny Bitcoin) changelog: required This pull request must update the CHANGELOG.md file or explicitly be marked with changelog: skip labels May 29, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 92.21311% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
database/tbcd/level/level.go 0.00% 19 Missing ⚠️

📢 Thoughts on this report? Let us know!

BlockByHash calls btcutil.NewBlockFromBytes on every read, eagerly
deserializing the entire block into heap objects. Callers needing
one tx pay for all txs. CPU profiles show 50% GC pressure from this.

Add lazyBlock type wrapping raw []byte from the block cache with
lazy per-tx access — no deserialization until a specific tx is
requested. Single-pass boundary scan finds tx offsets without
parsing. Per-tx txid computation handles both witness and non-witness
serialization. Per-tx output value extraction reads only the outputs
section.

Add BlockRawByHash to the DB interface — the cache-check + LevelDB
read path from BlockByHash without the NewBlockFromBytes call.

Existing BlockByHash callers are unchanged — this is an opt-in
parallel path for callers that need lightweight access.

100% test coverage on lazyblock.go. Every method cross-checked
against btcutil.NewBlockFromBytes output as oracle. Test blocks
include: genesis, segwit single/multi input, mixed segwit/non-segwit,
50-tx blocks, empty witness items, large inscription-style witness,
FullBlock byte-identical round-trip, and exhaustive error path
coverage for all truncation boundaries.
@github-actions github-actions Bot added area: docs This is a change to documentation changelog: done This pull request includes an appropriate update to CHANGELOG.md. and removed changelog: required This pull request must update the CHANGELOG.md file or explicitly be marked with changelog: skip labels May 29, 2026
This was referenced May 29, 2026
Comment on lines +1596 to +1620
// get from cache
var (
eb []byte
err error
)
if l.cfg.blockCacheSize > 0 {
eb, _ = l.blockCache.Get(hash)
}

// get from db
if eb == nil {
bDB := l.rawPool[level.BlocksDB]
eb, err = bDB.Get(hash[:])
if err != nil {
if errors.Is(err, leveldb.ErrNotFound) {
return nil, database.BlockNotFoundError{Hash: hash}
}
return nil, fmt.Errorf("block raw get: %w", err)
}
if l.cfg.blockCacheSize > 0 {
l.blockCache.Put(hash, eb)
}
}

return eb, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this could be made more tidy by making the first cache check a quick-return instead of always falling through to a nil check. e.g.

Suggested change
// get from cache
var (
eb []byte
err error
)
if l.cfg.blockCacheSize > 0 {
eb, _ = l.blockCache.Get(hash)
}
// get from db
if eb == nil {
bDB := l.rawPool[level.BlocksDB]
eb, err = bDB.Get(hash[:])
if err != nil {
if errors.Is(err, leveldb.ErrNotFound) {
return nil, database.BlockNotFoundError{Hash: hash}
}
return nil, fmt.Errorf("block raw get: %w", err)
}
if l.cfg.blockCacheSize > 0 {
l.blockCache.Put(hash, eb)
}
}
return eb, nil
// get from cache
if l.cfg.blockCacheSize > 0 {
if eb, _ := l.blockCache.Get(hash); eb != nil {
return eb, nil
}
}
// get from db
bDB := l.rawPool[level.BlocksDB]
eb, err := bDB.Get(hash[:])
if err != nil {
if errors.Is(err, leveldb.ErrNotFound) {
return nil, database.BlockNotFoundError{Hash: hash}
}
return nil, fmt.Errorf("block raw get: %w", err)
}
if l.cfg.blockCacheSize > 0 {
l.blockCache.Put(hash, eb)
}
return eb, nil

Comment thread service/tbc/lazyblock.go
// must be a complete serialized Bitcoin block (header + transactions).
// No data is parsed until a method is called.
func newLazyBlock(raw []byte) *lazyBlock {
return &lazyBlock{raw: raw}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider validating the length of raw once here, to prevent a lazyBlock from being created wrapping obviously incorrect raw bytes. This also removes the need to check the length in receivers.

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

Comment thread service/tbc/lazyblock.go
return nil, nil, fmt.Errorf("scanTxBoundaries: tx %d input %d: script len: %w", t, i, err)
}
offset += n
if offset+int(scriptLen) > len(raw) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unsafe to cast scriptLen to an int, as readVarInt explicitly parses untrusted bytes as a uint64 and returns it as such.

With a valid block, it would not be possible to trigger a panic - but it is still possible on invalid or corrupted raw inputs. I think it would be better to handle the value correctly than make assumptions.

Same on lines 231, 261, and 264.

Suggested change
if offset+int(scriptLen) > len(raw) {
if scriptLen > uint64(len(raw) - offset) {

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

Comment thread service/tbc/lazyblock.go
Comment on lines +188 to +189
locs := make([]wire.TxLoc, 0, txCount)
witness := make([]bool, 0, txCount)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not possible with valid Bitcoin block, txCount is a uint64 parsed from untrusted bytes. This could potentially allow unbounded allocation of these slices, up-to triggering a panic.

Consider validating txCount before allocating.

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

Comment thread service/tbc/lazyblock.go
}
offset += n

values := make([]uint64, outputCount)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not exploitable with a valid Bitcoin block, outputCount is a uint64 parsed from untrusted bytes. This could potentially allow unbounded allocation of these slices, up-to triggering a panic.

Consider validating outputCount before allocating.

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

Comment thread service/tbc/lazyblock.go
Comment on lines +61 to +67
offsets, witness, err := scanTxBoundaries(lb.raw)
if err != nil {
return err
}
lb.txOffsets = offsets
lb.txWitness = witness
return nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
offsets, witness, err := scanTxBoundaries(lb.raw)
if err != nil {
return err
}
lb.txOffsets = offsets
lb.txWitness = witness
return nil
var err error
lb.txOffsets, lb.txWitness, err = scanTxBoundaries(lb.raw)
return err

Comment thread service/tbc/lazyblock.go

// newLazyBlock wraps raw block bytes for lazy access. The raw slice
// must be a complete serialized Bitcoin block (header + transactions).
// No data is parsed until a method is called.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider documenting expectations for raw input here.

The correctness of lazyBlock is entirely dependant on raw being guaranteed as immutable and kept alive for the full lifetime of every lazyBlock. If the underlying raw byte slice is ever reused or recycled, the entire lazyBlock would be silently corrupted and later cause errors, panics or incorrect data to be returned.

Additionally, if raw is a sub-slice of a larger byte slice, holding it pins the entire parent's backing array, not just the block's own len(raw) bytes. The raw slice would have to be copied into a new slice to prevent this.

This also assumes raw is always a valid Bitcoin block, which is not guaranteed. If created with incorrect or corrupted data, receivers below can error, panic or return invalid data. - If all instances of a lazyBlock being created are guaranteed to only input valid data, this is probably okay; if not, the scanning needs hardening.

Suggested change
// No data is parsed until a method is called.
// No data is parsed until a method is called.
// The caller must guarantee that raw is not mutated, reused, or recycled
// for the lifetime of the returned lazyBlock. raw bytes are referenced,
// not copied. raw must be a complete, valid serialized Bitcoin block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: docs This is a change to documentation area: tbc This is a change to TBC (Tiny Bitcoin) changelog: done This pull request includes an appropriate update to CHANGELOG.md.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants