gossipsub-score-sim

Mini take-home: implement an application-level message validator + per-peer scoring/quarantine on top of Rust libp2p Gossipsub.

Time budget: We expect this to take 3–4 hours. Please don't spend more than 6. Bonus tasks are explicitly optional — skip them entirely if you're at the time limit.

About this project

This project simulates a peer-to-peer gossip network where nodes exchange messages over libp2p's Gossipsub protocol. Some peers are honest; others are attackers sending junk, oversized payloads, or well-formed floods at high frequency.

Your job is to build the defense layer:

Message validation — decode and inspect every inbound message, deciding Accept, Reject, or Ignore before it propagates further.
Rate limiting — per-peer token buckets that throttle floods without dropping legitimate traffic.
Deduplicate — a bounded cache that prevents the same message from being processed twice while staying within a fixed memory budget.
Peer scoring + quarantine — track each peer's behavior over time. Penalize bad actors, reward honest ones, and quarantine peers whose score drops below a threshold so they can no longer pollute the network.

A successful implementation keeps honest delivery above 90% while rejecting over 95% of spam — under adversarial conditions, with bounded memory and CPU.

Why this matters: real-world context

The validation, scoring, and quarantine patterns in this exercise are not academic — they are the same mechanisms running in production across major decentralized networks. Here are concrete systems where this knowledge applies directly:

Ethereum consensus layer (beacon chain). Ethereum's beacon chain uses Gossipsub v1.1 with application-level peer scoring to propagate attestations, blocks, and sync committee messages. Invalid attestations are penalized, peers that flood are pruned from the mesh, and scoring parameters directly affect chain finality and fork-choice safety. The tradeoffs you reason about here — penalty asymmetry, decay rates, quarantine thresholds — are the same decisions Ethereum client teams (Prysm, Lighthouse, Teku, Lodestar) make and tune in production.

Filecoin. Filecoin uses Gossipsub to propagate block headers and deal messages across storage miners. Peer scoring prevents eclipse attacks where an adversary surrounds a target node with malicious peers to control what it sees.

FROST threshold signing coordination. The topic name in this sim (frost-sim/coordination/1) is not accidental — threshold signature protocols like FROST require reliable broadcast among signers. A compromised gossip layer can prevent threshold ceremonies from completing or trick signers into signing conflicting messages. The validation pipeline you build here is the first line of defense for signing coordination.

Cross-chain bridges and DeFi relayers. Cross-chain bridges often use p2p gossip for validator-to-validator coordination. A single unscored peer flooding garbage can delay bridge finality or cause validators to miss signing windows, directly impacting bridge liveness and user funds.

In all of these systems, getting peer scoring wrong has real consequences: too aggressive and you partition the network; too lenient and spam overwhelms honest traffic. This exercise puts you in that design space.

Prerequisites

Rust toolchain: 1.75+ (edition 2021). Install via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

OS: Linux or macOS. Windows works under WSL2 but is not tested.
No external services required. The simulation runs entirely in-process over localhost TCP.

Verify your setup:

rustc --version   # should print 1.75.0 or newer
cargo --version

How the skeleton works

This starter repo configures Gossipsub in manual validation mode:

Inbound messages are not automatically forwarded.
Your node must call report_message_validation_result(message_id, propagation_source, acceptance) to Accept / Reject / Ignore each message.

That's the main hook you'll use for validation + scoring. The simulation spawns N peers in-process with a random mesh topology, publishes honest and spam traffic, and collects per-node metrics at the end.

What you need to build

Core objectives

1. Validator (`src/validator.rs`)

Decode messages (bincode/serde — already wired up in the skeleton).
Enforce max size (already stubbed).
Add at least 2 more validation rules of your choosing (examples: sequence range checks, payload content checks, replay/dedupe detection, etc.).
Return: Accept / Reject / Ignore.
NOTE: The WireMessage::Control variant currently accepts anything with no validation. Is this safe? Address this in your implementation and writeup.

2. Rate limiting + backpressure

Per-peer publish rate limiting (token bucket is fine).
Bounded queues / bounded memory for dedupe sets.
Demonstrate that spam nodes don't blow up memory/CPU. The sim includes three spam modes: junk bytes, oversize payloads, and well-formed floods (valid format, valid size, high frequency). Your rate limiter must handle all three.

3. Peer scoring + quarantine

Maintain an application score per peer (can be separate from Gossipsub's internal score).
Penalize Rejects, reward valid/first-seen messages, penalize floods.
Quarantine behavior when a peer's score drops below a threshold (e.g., Ignore their messages, disconnect, stop dialing, or prune).
The score_delta values in the skeleton (−5, −2, +0.1) are placeholders — tune or redesign the scoring model as you see fit. We're interested in your reasoning.

Scoring methodology:

Event	Delta	Rationale
Valid first-seen message	+1.0	Reward honest participation
Duplicate (dedupe hit)	0.0 (Ignore, no penalty)	Dupes are normal in gossip meshes
Decode error	-10.0	Only malicious/buggy peers send garbage
Oversize	-10.0	Same — no honest peer should exceed the limit
Empty payload	-5.0	Less severe but still invalid
Rate-limited (flood)	-3.0 per excess msg	Graduated — occasional bursts aren't fatal
Control (unvalidated)	-1.0	Mild skepticism until real validation is added

Design principles:

Decay toward zero: apply score *= 0.95 each second so penalized peers can recover.
Quarantine threshold: score < -50 triggers Ignore for all messages from that peer.
Asymmetric magnitudes: penalties are much larger than rewards (+1 vs -10), so one bad message requires ~10 good ones to recover. Prevents "be good then attack" strategies.
Bounded score range: clamp to [-100, +100] so long-running honest peers don't accumulate infinite credit that shields future bad behavior.

4. Simulation + summary report

The included sim spawns N peers in-process with a random mesh topology (each node dials --dial-peers random others, not just a single bootstrap node).
Expand it so it produces a clear outcome and prints a summary report at the end.
The report should include at minimum: per-node accept/reject/ignore counts, per-node counts broken down by source peer, and overall honest delivery rate vs. spam rejection rate.

Pass / fail gates

Under the default CLI flags (--peers 10 --bad-peers 2 --duration-secs 20), a passing submission must demonstrate:

Metric	Threshold
Honest message delivery rate	> 90%
Spam rejection rate (across all nodes)	> 95%
No unbounded growth in memory/maps	Manual review
Every inbound message gets exactly one Accept/Reject/Ignore report	Required

Bonus tasks

Bonus A: Content-addressed message IDs with domain separation. The skeleton already hashes message bytes for the MessageId. Add domain separation (include topic hash, sender, and a version tag in the hash preimage). Write a short paragraph explaining tradeoffs (replay across topics, uniqueness guarantees, overhead).
Bonus B: Eclipse detection. Add an attacker mode where each bad node opens K connections to a single victim. Detect when >80% of a node's inbound messages originate from fewer than 3 distinct peers within a sliding window of the last 100 messages. Surface this in metrics and the summary report.

Writeup (200–500 words)

In a file called WRITEUP.md (template included), briefly cover:

What tradeoffs you made in your scoring parameters and why.
What you'd do differently with more time.
One attack vector your implementation does not handle.

This is often more revealing than the code — show us you understand the limits of what you built.

Implementation walkthrough

Manual validation flow

Gossipsub is configured in manual validation mode, so inbound messages are not forwarded until the app reports Accept/Reject/Ignore.
src/p2p.rs receives messages, calls Validator::validate, and reports the result.
Message IDs are domain-separated hashes of topic + data (see src/behaviour.rs). Tradeoffs:
- Prevents cross-topic collisions and accidental dedupe between unrelated topics.
- Identical payloads on different topics no longer dedupe (by design).
- All peers must use the same MessageId function or gossip compatibility breaks.

Validator rules (`src/validator.rs`)

Size bounds: reject messages larger than max_message_bytes or smaller than min_message_bytes.
Decode check: reject on bincode decode failure.
Payload rule: reject empty payloads for WireMessage::Good.
Control rules: allow kind in {0,1,2}, reject unknown kinds, and reject control payloads larger than 256 bytes.
Dedupe: ignore duplicates by MessageId within the configured TTL.

Rate limiting (per-peer token bucket)

Each peer has a token bucket in fixed-point millitokens.
Tokens refill at rate_milli_per_sec up to burst_milli.
Messages are ignored when the bucket is empty.
Buckets are bounded by max_peers with TTL cleanup.

Dedupe / backpressure

Dedupe uses a bounded HashMap + VecDeque cache with TTL and FIFO eviction.
This keeps memory bounded under spam.

App-level scoring and quarantine

Each peer has a score updated by Decision::score_delta.
Scores are clamped to [score_floor, score_ceiling] and evicted via TTL.
If score drops below quarantine_threshold, the peer is quarantined for quarantine_duration_ns and its messages are ignored.

Attacker mode + eclipse detection (bonus)

Attacker mode dials bad peers only into a chosen victim; good peers dial a separate bootstrap.
Each node tracks a sliding window of unique peers and reject rate, and logs a warning when unique peers drop below a threshold while rejects spike.

How to run

# Full simulation with default flags:
RUST_LOG=info cargo run --release -- \
  --peers 10 \
  --bad-peers 2 \
  --duration-secs 20 \
  --publish-per-sec 5 \
  --spam-per-sec 50 \
  --max-message-bytes 16384 \
  --dial-peers 3

# Quick smoke test (fewer peers, shorter duration):
RUST_LOG=info cargo run --release -- \
  --peers 4 \
  --bad-peers 1 \
  --duration-secs 5

# See all available flags:
cargo run --release -- --help

CLI flags reference

Flag	Default	Description
`--peers`	8	Total peers (includes bad peers)
`--bad-peers`	2	First N peers are attackers/spammers
`--duration-secs`	20	Simulation duration in seconds
`--publish-per-sec`	5	Honest publish rate per peer
`--spam-per-sec`	50	Bad publish rate per peer
`--topic`	`frost-sim/coordination/1`	Gossipsub topic name
`--seed`	1337	RNG seed for reproducible runs
`--min-message-bytes`	1	Reject messages <= this size
`--max-message-bytes`	16384	Reject messages > this size
`--dial-peers`	3	How many random peers each node dials
`--attacker-mode`	false	Enable attacker mode (eclipse)
`--victim-idx`	0	Victim peer index for attacker mode
`--max-peers`	1024	Max tracked peers for rate limiting
`--cleanup-interval`	1000000000	Bucket cleanup interval (ns)
`--bucket-ttl`	5000000000	Bucket TTL (ns)
`--rate-milli-per-sec`	5000	Token refill rate (millitokens/sec)
`--burst-milli`	10000	Token bucket burst size (millitokens)
`--dedupe-max-entries`	10000	Dedupe cache max entries
`--dedupe-ttl`	10000000000	Dedupe TTL (ns)
`--score-floor`	-20.0	Score min clamp
`--score-ceiling`	20.0	Score max clamp
`--quarantine-threshold`	-10.0	Score threshold to quarantine
`--quarantine-duration-ns`	10000000000	Quarantine duration (ns)
`--score-ttl-ns`	60000000000	Score entry TTL (ns)
`--max-score-peers`	1024	Max tracked peers for scoring
`--detect-window-ns`	5000000000	Detection window (ns)
`--detect-min-unique`	3	Min unique peers before flag
`--detect-reject-rate`	0.7	Reject-rate threshold to flag

Note: when passing negative floats, use = or quotes (e.g., --score-floor=-20.0 or --score-floor "-20.0"), otherwise clap treats them as flags.

Unhappy-path focused run (more attackers, higher spam rate):

RUST_LOG=info cargo run --release -- \
  --peers 10 \
  --bad-peers 4 \
  --duration-secs 20 \
  --publish-per-sec 50 \
  --spam-per-sec 200 \
  --min-message-bytes 1 \
  --max-message-bytes 1024

Attacker-mode example (eclipse detection):

RUST_LOG=info cargo run --release -- \
  --peers 10 \
  --bad-peers 4 \
  --attacker-mode true \
  --victim-idx 5 \
  --detect-min-unique 5 \
  --duration-secs 20 \
  --publish-per-sec 5 \
  --spam-per-sec 200 \
  --min-message-bytes 1 \
  --max-message-bytes 1024

Tests

# Run all tests (unit + proptest + integration):
cargo test

# Run only the golden-path integration test with output:
cargo test --test integration -- --nocapture

# Run only the proptest suite:
cargo test --test validator_prop

# Run only the inline unit tests in validator.rs:
cargo test validator::tests

The golden-path integration test (tests/integration.rs) spins up 3 honest nodes + 1 spammer for 5 seconds and asserts basic sanity: messages are flowing, some are accepted, some are rejected. Once you implement scoring + rate limiting, tighten the commented-out assertions to match the pass/fail gates.

Repo layout

gossipsub-score-sim/
  Cargo.toml
  README.md
  WRITEUP.md          ← fill this in (200–500 words)
  .gitignore
  src/
    lib.rs            ← crate root, re-exports all modules
    main.rs           ← binary entry point
    cli.rs            ← CLI argument definitions (clap)
    sim.rs            ← simulation orchestrator (topology, publishers, report)
    p2p.rs            ← per-node swarm loop (where validation happens)
    behaviour.rs      ← NetworkBehaviour wrapper (gossipsub config)
    codec.rs          ← WireMessage serde types + encode/decode
    validator.rs      ← message validation logic ← YOUR MAIN WORK HERE
    metrics.rs        ← per-node + per-peer counters
  tests/
    validator_prop.rs ← property-based tests for the validator
    integration.rs    ← golden-path end-to-end smoke test

Troubleshooting

cargo build fails with libp2p version errors: Make sure you're on Rust 1.75+. Run rustup update stable. If you still see issues, delete Cargo.lock and retry — the Cargo.toml specifies libp2p = "0.56" which resolves to the latest 0.56.x patch.

"Address already in use" errors: Each node listens on /ip4/127.0.0.1/tcp/0 (OS-assigned port). If you see bind failures, check for leftover processes from a previous run: pkill -f gossipsub-score-sim.

macOS: "Too many open files": The simulation opens many TCP connections. If you run with --peers 20+, you may hit the default file descriptor limit. Fix with: ulimit -n 4096 before running.

Tests hang or time out: The integration test runs for 5 seconds by design. If it hangs beyond ~15 seconds, there's likely a deadlock in your swarm event loop. Check that every code path in the tokio::select! loop either processes or drops the message — never blocks.

No messages accepted/rejected: Make sure nodes have time to establish mesh connections. The skeleton sleeps 1 second after dialing before publishing. If you reduce --duration-secs below 3, nodes may not have enough time to subscribe and graft.

Evaluation rubric

Area	Weight	What we look for
Correctness	30%	Every message gets exactly one verdict. Validation rules are sound. No panics under fuzz.
Resource bounding	25%	Bounded dedupe sets, bounded per-peer state, no OOM under spam load.
Scoring & quarantine	20%	Coherent model: penalties/rewards make sense, quarantine triggers at a reasonable threshold, and recovery is possible.
Simulation & report	15%	Meets pass/fail gates. Report is clear and human-readable or machine-parseable.
Writeup & code quality	10%	Clear reasoning about tradeoffs. Clean code. Good naming.

Bonus tasks are additive — they can only help your score, never hurt it.

Possible improvements

Switch dedupe to LRU semantics or sliding TTL so hot IDs are less likely to be evicted.
Add periodic TTL sweeps for the dedupe cache so old entries expire even without re-seen traffic.
Replace the O(n) bucket eviction scan with a proper LRU (heap + linked list or a dedicated LRU cache crate).
Create a non-zero type for configuration variables (e.g., dedupe_max_entries, rate_milli_per_sec) to enable more compile-time checks instead of runtime short-circuits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gossipsub-score-sim

About this project

Why this matters: real-world context

Prerequisites

How the skeleton works

What you need to build

Core objectives

1. Validator (`src/validator.rs`)

2. Rate limiting + backpressure

3. Peer scoring + quarantine

4. Simulation + summary report

Pass / fail gates

Bonus tasks

Writeup (200–500 words)

Implementation walkthrough

Manual validation flow

Validator rules (`src/validator.rs`)

Rate limiting (per-peer token bucket)

Dedupe / backpressure

App-level scoring and quarantine

Attacker mode + eclipse detection (bonus)

How to run

CLI flags reference

Tests

Repo layout

Troubleshooting

Evaluation rubric

Possible improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
tests		tests
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
WRITEUP.md		WRITEUP.md

Folders and files

Latest commit

History

Repository files navigation

gossipsub-score-sim

About this project

Why this matters: real-world context

Prerequisites

How the skeleton works

What you need to build

Core objectives

1. Validator (src/validator.rs)

2. Rate limiting + backpressure

3. Peer scoring + quarantine

4. Simulation + summary report

Pass / fail gates

Bonus tasks

Writeup (200–500 words)

Implementation walkthrough

Manual validation flow

Validator rules (src/validator.rs)

Rate limiting (per-peer token bucket)

Dedupe / backpressure

App-level scoring and quarantine

Attacker mode + eclipse detection (bonus)

How to run

CLI flags reference

Tests

Repo layout

Troubleshooting

Evaluation rubric

Possible improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Validator (`src/validator.rs`)

Validator rules (`src/validator.rs`)

Packages