tbc: add lazy block reader for zero-copy per-tx access by marcopeereboom · Pull Request #1051 · hemilabs/heminetwork

marcopeereboom · 2026-05-29T07:09:31Z

Summary

Add a lazyBlock type that wraps raw block bytes from the block cache with lazy per-tx access. Nothing is parsed until explicitly requested — individual tx access parses only that tx from its raw byte range.

Problem

BlockByHash calls btcutil.NewBlockFromBytes on every read, which eagerly deserializes the entire block into heap objects (MsgBlock, every MsgTx, every TxIn/TxOut, witness slices). For a 3 MB block with 5000 txs, callers that need one tx pay for all 5000. CPU profile shows 50%+ GC pressure from discarded deserialization objects.

Solution

BlockRawByHash added to DB interface — returns raw []byte from block cache/LevelDB without parsing
lazyBlock type in service/tbc/lazyblock.go:
- Hash() — SHA256 of first 80 bytes, computed on demand
- TxCount() / TxHash(i) — scan tx boundaries from raw bytes, compute txid via SHA256 (handles segwit witness flag correctly)
- FindTx(txid) — iterate TxHash(i) until match
- TxOutputValues(i) — parse only one tx's output values from raw bytes
- FullBlock() — fallback to btcutil.NewBlockFromBytes

No btcutil fork. External to btcutil entirely. Existing BlockByHash callers unchanged.

Testing

1217 lines of tests covering: boundary scanning, txid computation (legacy + segwit + witness), output value extraction, FindTx, edge cases (empty blocks, coinbase-only), full-block fallback
Tested against known mainnet and testnet blocks

Files changed

database/tbcd/database.go — add BlockRawByHash to interface
database/tbcd/level/level.go — implement BlockRawByHash
service/tbc/lazyblock.go — new file
service/tbc/lazyblock_test.go — new file
service/tbc/cpfp_test.go — stub updated

codecov · 2026-05-29T07:12:54Z

Codecov Report

❌ Patch coverage is 92.21311% with 19 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
database/tbcd/level/level.go	0.00%	19 Missing ⚠️

📢 Thoughts on this report? Let us know!

BlockByHash calls btcutil.NewBlockFromBytes on every read, eagerly deserializing the entire block into heap objects. Callers needing one tx pay for all txs. CPU profiles show 50% GC pressure from this. Add lazyBlock type wrapping raw []byte from the block cache with lazy per-tx access — no deserialization until a specific tx is requested. Single-pass boundary scan finds tx offsets without parsing. Per-tx txid computation handles both witness and non-witness serialization. Per-tx output value extraction reads only the outputs section. Add BlockRawByHash to the DB interface — the cache-check + LevelDB read path from BlockByHash without the NewBlockFromBytes call. Existing BlockByHash callers are unchanged — this is an opt-in parallel path for callers that need lightweight access. 100% test coverage on lazyblock.go. Every method cross-checked against btcutil.NewBlockFromBytes output as oracle. Test blocks include: genesis, segwit single/multi input, mixed segwit/non-segwit, 50-tx blocks, empty witness items, large inscription-style witness, FullBlock byte-identical round-trip, and exhaustive error path coverage for all truncation boundaries.

joshuasing · 2026-05-29T08:54:52Z

+	// get from cache
+	var (
+		eb  []byte
+		err error
+	)
+	if l.cfg.blockCacheSize > 0 {
+		eb, _ = l.blockCache.Get(hash)
+	}
+
+	// get from db
+	if eb == nil {
+		bDB := l.rawPool[level.BlocksDB]
+		eb, err = bDB.Get(hash[:])
+		if err != nil {
+			if errors.Is(err, leveldb.ErrNotFound) {
+				return nil, database.BlockNotFoundError{Hash: hash}
+			}
+			return nil, fmt.Errorf("block raw get: %w", err)
+		}
+		if l.cfg.blockCacheSize > 0 {
+			l.blockCache.Put(hash, eb)
+		}
+	}
+
+	return eb, nil


nit: I think this could be made more tidy by making the first cache check a quick-return instead of always falling through to a nil check. e.g.

Suggested change

// get from cache

var (

eb []byte

err error

)

if l.cfg.blockCacheSize > 0 {

eb, _ = l.blockCache.Get(hash)

}

// get from db

if eb == nil {

bDB := l.rawPool[level.BlocksDB]

eb, err = bDB.Get(hash[:])

if err != nil {

if errors.Is(err, leveldb.ErrNotFound) {

return nil, database.BlockNotFoundError{Hash: hash}

}

return nil, fmt.Errorf("block raw get: %w", err)

}

if l.cfg.blockCacheSize > 0 {

l.blockCache.Put(hash, eb)

}

}

return eb, nil

// get from cache

if l.cfg.blockCacheSize > 0 {

if eb, _ := l.blockCache.Get(hash); eb != nil {

return eb, nil

}

}

// get from db

bDB := l.rawPool[level.BlocksDB]

eb, err := bDB.Get(hash[:])

if err != nil {

if errors.Is(err, leveldb.ErrNotFound) {

return nil, database.BlockNotFoundError{Hash: hash}

}

return nil, fmt.Errorf("block raw get: %w", err)

}

if l.cfg.blockCacheSize > 0 {

l.blockCache.Put(hash, eb)

}

return eb, nil

joshuasing · 2026-05-29T09:03:22Z

+// must be a complete serialized Bitcoin block (header + transactions).
+// No data is parsed until a method is called.
+func newLazyBlock(raw []byte) *lazyBlock {
+	return &lazyBlock{raw: raw}


Consider validating the length of raw once here, to prevent a lazyBlock from being created wrapping obviously incorrect raw bytes. This also removes the need to check the length in receivers.

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

joshuasing · 2026-05-29T09:54:13Z

+				return nil, nil, fmt.Errorf("scanTxBoundaries: tx %d input %d: script len: %w", t, i, err)
+			}
+			offset += n
+			if offset+int(scriptLen) > len(raw) {


It is unsafe to cast scriptLen to an int, as readVarInt explicitly parses untrusted bytes as a uint64 and returns it as such.

With a valid block, it would not be possible to trigger a panic - but it is still possible on invalid or corrupted raw inputs. I think it would be better to handle the value correctly than make assumptions.

Same on lines 231, 261, and 264.

Suggested change

if offset+int(scriptLen) > len(raw) {

if scriptLen > uint64(len(raw) - offset) {

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

joshuasing · 2026-05-29T09:57:06Z

+	locs := make([]wire.TxLoc, 0, txCount)
+	witness := make([]bool, 0, txCount)


While not possible with valid Bitcoin block, txCount is a uint64 parsed from untrusted bytes. This could potentially allow unbounded allocation of these slices, up-to triggering a panic.

Consider validating txCount before allocating.

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

joshuasing · 2026-05-29T09:59:11Z

+	}
+	offset += n
+
+	values := make([]uint64, outputCount)


While not exploitable with a valid Bitcoin block, outputCount is a uint64 parsed from untrusted bytes. This could potentially allow unbounded allocation of these slices, up-to triggering a panic.

Consider validating outputCount before allocating.

Related: https://github.com/hemilabs/heminetwork/pull/1051/changes#r3323667033

joshuasing · 2026-05-29T10:03:40Z

+	offsets, witness, err := scanTxBoundaries(lb.raw)
+	if err != nil {
+		return err
+	}
+	lb.txOffsets = offsets
+	lb.txWitness = witness
+	return nil


Suggested change

offsets, witness, err := scanTxBoundaries(lb.raw)

if err != nil {

return err

}

lb.txOffsets = offsets

lb.txWitness = witness

return nil

var err error

lb.txOffsets, lb.txWitness, err = scanTxBoundaries(lb.raw)

return err

joshuasing · 2026-05-29T10:16:54Z

+
+// newLazyBlock wraps raw block bytes for lazy access. The raw slice
+// must be a complete serialized Bitcoin block (header + transactions).
+// No data is parsed until a method is called.


Consider documenting expectations for raw input here.

The correctness of lazyBlock is entirely dependant on raw being guaranteed as immutable and kept alive for the full lifetime of every lazyBlock. If the underlying raw byte slice is ever reused or recycled, the entire lazyBlock would be silently corrupted and later cause errors, panics or incorrect data to be returned.

Additionally, if raw is a sub-slice of a larger byte slice, holding it pins the entire parent's backing array, not just the block's own len(raw) bytes. The raw slice would have to be copied into a new slice to prevent this.

This also assumes raw is always a valid Bitcoin block, which is not guaranteed. If created with incorrect or corrupted data, receivers below can error, panic or return invalid data. - If all instances of a lazyBlock being created are guaranteed to only input valid data, this is probably okay; if not, the scanning needs hardening.

Suggested change

// No data is parsed until a method is called.

// No data is parsed until a method is called.

// The caller must guarantee that raw is not mutated, reused, or recycled

// for the lifetime of the returned lazyBlock. raw bytes are referenced,

// not copied. raw must be a complete, valid serialized Bitcoin block.

marcopeereboom requested a review from a team as a code owner May 29, 2026 07:09

github-actions Bot added area: tbc This is a change to TBC (Tiny Bitcoin) changelog: required This pull request must update the CHANGELOG.md file or explicitly be marked with changelog: skip labels May 29, 2026

marcopeereboom mentioned this pull request May 29, 2026

tbc: store tx byte location in tx index, bump DB to v6 #1052

Open

marcopeereboom force-pushed the marco/lazy-block branch from 9f1a394 to 99d768f Compare May 29, 2026 07:15

github-actions Bot added area: docs This is a change to documentation changelog: done This pull request includes an appropriate update to CHANGELOG.md. and removed changelog: required This pull request must update the CHANGELOG.md file or explicitly be marked with changelog: skip labels May 29, 2026

This was referenced May 29, 2026

tbc: add ordinal indexer #1024

Closed

tbc: add ordinal indexer #1053

Open

joshuasing requested changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tbc: add lazy block reader for zero-copy per-tx access#1051

tbc: add lazy block reader for zero-copy per-tx access#1051
marcopeereboom wants to merge 1 commit into
mainfrom
marco/lazy-block

marcopeereboom commented May 29, 2026

Uh oh!

codecov Bot commented May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

joshuasing May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if offset+int(scriptLen) > len(raw) {
	if scriptLen > uint64(len(raw) - offset) {

		locs := make([]wire.TxLoc, 0, txCount)
		witness := make([]bool, 0, txCount)

-// No data is parsed until a method is called.
+// No data is parsed until a method is called.
+// The caller must guarantee that raw is not mutated, reused, or recycled
+// for the lifetime of the returned lazyBlock. raw bytes are referenced,
+// not copied. raw must be a complete, valid serialized Bitcoin block.

Conversation

marcopeereboom commented May 29, 2026

Summary

Problem

Solution

Testing

Files changed

Uh oh!

codecov Bot commented May 29, 2026

Codecov Report

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

joshuasing May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants