Mini take-home: implement an application-level message validator + per-peer scoring/quarantine on top of Rust libp2p Gossipsub.
Time budget: We expect this to take 3–4 hours. Please don't spend more than 6. Bonus tasks are explicitly optional — skip them entirely if you're at the time limit.
This project simulates a peer-to-peer gossip network where nodes exchange messages over libp2p's Gossipsub protocol. Some peers are honest; others are attackers sending junk, oversized payloads, or well-formed floods at high frequency.
Your job is to build the defense layer:
- Message validation — decode and inspect every inbound message, deciding Accept, Reject, or Ignore before it propagates further.
- Rate limiting — per-peer token buckets that throttle floods without dropping legitimate traffic.
- Deduplicate — a bounded cache that prevents the same message from being processed twice while staying within a fixed memory budget.
- Peer scoring + quarantine — track each peer's behavior over time. Penalize bad actors, reward honest ones, and quarantine peers whose score drops below a threshold so they can no longer pollute the network.
A successful implementation keeps honest delivery above 90% while rejecting over 95% of spam — under adversarial conditions, with bounded memory and CPU.
The validation, scoring, and quarantine patterns in this exercise are not academic — they are the same mechanisms running in production across major decentralized networks. Here are concrete systems where this knowledge applies directly:
Ethereum consensus layer (beacon chain). Ethereum's beacon chain uses Gossipsub v1.1 with application-level peer scoring to propagate attestations, blocks, and sync committee messages. Invalid attestations are penalized, peers that flood are pruned from the mesh, and scoring parameters directly affect chain finality and fork-choice safety. The tradeoffs you reason about here — penalty asymmetry, decay rates, quarantine thresholds — are the same decisions Ethereum client teams (Prysm, Lighthouse, Teku, Lodestar) make and tune in production.
Filecoin. Filecoin uses Gossipsub to propagate block headers and deal messages across storage miners. Peer scoring prevents eclipse attacks where an adversary surrounds a target node with malicious peers to control what it sees.
FROST threshold signing coordination. The topic name in this sim
(frost-sim/coordination/1) is not accidental — threshold signature protocols like FROST
require reliable broadcast among signers. A compromised gossip layer can prevent threshold
ceremonies from completing or trick signers into signing conflicting messages. The
validation pipeline you build here is the first line of defense for signing coordination.
Cross-chain bridges and DeFi relayers. Cross-chain bridges often use p2p gossip for validator-to-validator coordination. A single unscored peer flooding garbage can delay bridge finality or cause validators to miss signing windows, directly impacting bridge liveness and user funds.
In all of these systems, getting peer scoring wrong has real consequences: too aggressive and you partition the network; too lenient and spam overwhelms honest traffic. This exercise puts you in that design space.
- Rust toolchain: 1.75+ (edition 2021). Install via rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh- OS: Linux or macOS. Windows works under WSL2 but is not tested.
- No external services required. The simulation runs entirely in-process over localhost TCP.
Verify your setup:
rustc --version # should print 1.75.0 or newer
cargo --versionThis starter repo configures Gossipsub in manual validation mode:
- Inbound messages are not automatically forwarded.
- Your node must call
report_message_validation_result(message_id, propagation_source, acceptance)to Accept / Reject / Ignore each message.
That's the main hook you'll use for validation + scoring. The simulation spawns N peers in-process with a random mesh topology, publishes honest and spam traffic, and collects per-node metrics at the end.
- Decode messages (bincode/serde — already wired up in the skeleton).
- Enforce max size (already stubbed).
- Add at least 2 more validation rules of your choosing (examples: sequence range checks, payload content checks, replay/dedupe detection, etc.).
- Return: Accept / Reject / Ignore.
- NOTE: The
WireMessage::Controlvariant currently accepts anything with no validation. Is this safe? Address this in your implementation and writeup.
- Per-peer publish rate limiting (token bucket is fine).
- Bounded queues / bounded memory for dedupe sets.
- Demonstrate that spam nodes don't blow up memory/CPU. The sim includes three spam modes: junk bytes, oversize payloads, and well-formed floods (valid format, valid size, high frequency). Your rate limiter must handle all three.
- Maintain an application score per peer (can be separate from Gossipsub's internal score).
- Penalize Rejects, reward valid/first-seen messages, penalize floods.
- Quarantine behavior when a peer's score drops below a threshold (e.g., Ignore their messages, disconnect, stop dialing, or prune).
- The
score_deltavalues in the skeleton (−5, −2, +0.1) are placeholders — tune or redesign the scoring model as you see fit. We're interested in your reasoning.
Scoring methodology:
| Event | Delta | Rationale |
|---|---|---|
| Valid first-seen message | +1.0 | Reward honest participation |
| Duplicate (dedupe hit) | 0.0 (Ignore, no penalty) | Dupes are normal in gossip meshes |
| Decode error | -10.0 | Only malicious/buggy peers send garbage |
| Oversize | -10.0 | Same — no honest peer should exceed the limit |
| Empty payload | -5.0 | Less severe but still invalid |
| Rate-limited (flood) | -3.0 per excess msg | Graduated — occasional bursts aren't fatal |
| Control (unvalidated) | -1.0 | Mild skepticism until real validation is added |
Design principles:
- Decay toward zero: apply
score *= 0.95each second so penalized peers can recover. - Quarantine threshold: score < -50 triggers Ignore for all messages from that peer.
- Asymmetric magnitudes: penalties are much larger than rewards (+1 vs -10), so one bad message requires ~10 good ones to recover. Prevents "be good then attack" strategies.
- Bounded score range: clamp to [-100, +100] so long-running honest peers don't accumulate infinite credit that shields future bad behavior.
- The included sim spawns N peers in-process with a random mesh topology (each node dials
--dial-peersrandom others, not just a single bootstrap node). - Expand it so it produces a clear outcome and prints a summary report at the end.
- The report should include at minimum: per-node accept/reject/ignore counts, per-node counts broken down by source peer, and overall honest delivery rate vs. spam rejection rate.
Under the default CLI flags (--peers 10 --bad-peers 2 --duration-secs 20), a passing
submission must demonstrate:
| Metric | Threshold |
|---|---|
| Honest message delivery rate | > 90% |
| Spam rejection rate (across all nodes) | > 95% |
| No unbounded growth in memory/maps | Manual review |
| Every inbound message gets exactly one Accept/Reject/Ignore report | Required |
-
Bonus A: Content-addressed message IDs with domain separation. The skeleton already hashes message bytes for the MessageId. Add domain separation (include topic hash, sender, and a version tag in the hash preimage). Write a short paragraph explaining tradeoffs (replay across topics, uniqueness guarantees, overhead).
-
Bonus B: Eclipse detection. Add an attacker mode where each bad node opens K connections to a single victim. Detect when >80% of a node's inbound messages originate from fewer than 3 distinct peers within a sliding window of the last 100 messages. Surface this in metrics and the summary report.
In a file called WRITEUP.md (template included), briefly cover:
- What tradeoffs you made in your scoring parameters and why.
- What you'd do differently with more time.
- One attack vector your implementation does not handle.
This is often more revealing than the code — show us you understand the limits of what you built.
- Gossipsub is configured in manual validation mode, so inbound messages are not forwarded until the app reports Accept/Reject/Ignore.
src/p2p.rsreceives messages, callsValidator::validate, and reports the result.- Message IDs are domain-separated hashes of topic + data (see
src/behaviour.rs). Tradeoffs:- Prevents cross-topic collisions and accidental dedupe between unrelated topics.
- Identical payloads on different topics no longer dedupe (by design).
- All peers must use the same MessageId function or gossip compatibility breaks.
- Size bounds: reject messages larger than
max_message_bytesor smaller thanmin_message_bytes. - Decode check: reject on bincode decode failure.
- Payload rule: reject empty payloads for
WireMessage::Good. - Control rules: allow
kindin {0,1,2}, reject unknown kinds, and reject control payloads larger than 256 bytes. - Dedupe: ignore duplicates by
MessageIdwithin the configured TTL.
- Each peer has a token bucket in fixed-point millitokens.
- Tokens refill at
rate_milli_per_secup toburst_milli. - Messages are ignored when the bucket is empty.
- Buckets are bounded by
max_peerswith TTL cleanup.
- Dedupe uses a bounded
HashMap + VecDequecache with TTL and FIFO eviction. - This keeps memory bounded under spam.
- Each peer has a score updated by
Decision::score_delta. - Scores are clamped to
[score_floor, score_ceiling]and evicted via TTL. - If score drops below
quarantine_threshold, the peer is quarantined forquarantine_duration_nsand its messages are ignored.
- Attacker mode dials bad peers only into a chosen victim; good peers dial a separate bootstrap.
- Each node tracks a sliding window of unique peers and reject rate, and logs a warning when unique peers drop below a threshold while rejects spike.
# Full simulation with default flags:
RUST_LOG=info cargo run --release -- \
--peers 10 \
--bad-peers 2 \
--duration-secs 20 \
--publish-per-sec 5 \
--spam-per-sec 50 \
--max-message-bytes 16384 \
--dial-peers 3
# Quick smoke test (fewer peers, shorter duration):
RUST_LOG=info cargo run --release -- \
--peers 4 \
--bad-peers 1 \
--duration-secs 5
# See all available flags:
cargo run --release -- --help| Flag | Default | Description |
|---|---|---|
--peers |
8 | Total peers (includes bad peers) |
--bad-peers |
2 | First N peers are attackers/spammers |
--duration-secs |
20 | Simulation duration in seconds |
--publish-per-sec |
5 | Honest publish rate per peer |
--spam-per-sec |
50 | Bad publish rate per peer |
--topic |
frost-sim/coordination/1 |
Gossipsub topic name |
--seed |
1337 | RNG seed for reproducible runs |
--min-message-bytes |
1 | Reject messages <= this size |
--max-message-bytes |
16384 | Reject messages > this size |
--dial-peers |
3 | How many random peers each node dials |
--attacker-mode |
false | Enable attacker mode (eclipse) |
--victim-idx |
0 | Victim peer index for attacker mode |
--max-peers |
1024 | Max tracked peers for rate limiting |
--cleanup-interval |
1000000000 | Bucket cleanup interval (ns) |
--bucket-ttl |
5000000000 | Bucket TTL (ns) |
--rate-milli-per-sec |
5000 | Token refill rate (millitokens/sec) |
--burst-milli |
10000 | Token bucket burst size (millitokens) |
--dedupe-max-entries |
10000 | Dedupe cache max entries |
--dedupe-ttl |
10000000000 | Dedupe TTL (ns) |
--score-floor |
-20.0 | Score min clamp |
--score-ceiling |
20.0 | Score max clamp |
--quarantine-threshold |
-10.0 | Score threshold to quarantine |
--quarantine-duration-ns |
10000000000 | Quarantine duration (ns) |
--score-ttl-ns |
60000000000 | Score entry TTL (ns) |
--max-score-peers |
1024 | Max tracked peers for scoring |
--detect-window-ns |
5000000000 | Detection window (ns) |
--detect-min-unique |
3 | Min unique peers before flag |
--detect-reject-rate |
0.7 | Reject-rate threshold to flag |
Note: when passing negative floats, use = or quotes (e.g., --score-floor=-20.0 or
--score-floor "-20.0"), otherwise clap treats them as flags.
Unhappy-path focused run (more attackers, higher spam rate):
RUST_LOG=info cargo run --release -- \
--peers 10 \
--bad-peers 4 \
--duration-secs 20 \
--publish-per-sec 50 \
--spam-per-sec 200 \
--min-message-bytes 1 \
--max-message-bytes 1024Attacker-mode example (eclipse detection):
RUST_LOG=info cargo run --release -- \
--peers 10 \
--bad-peers 4 \
--attacker-mode true \
--victim-idx 5 \
--detect-min-unique 5 \
--duration-secs 20 \
--publish-per-sec 5 \
--spam-per-sec 200 \
--min-message-bytes 1 \
--max-message-bytes 1024# Run all tests (unit + proptest + integration):
cargo test
# Run only the golden-path integration test with output:
cargo test --test integration -- --nocapture
# Run only the proptest suite:
cargo test --test validator_prop
# Run only the inline unit tests in validator.rs:
cargo test validator::testsThe golden-path integration test (tests/integration.rs) spins up 3 honest nodes +
1 spammer for 5 seconds and asserts basic sanity: messages are flowing, some are accepted,
some are rejected. Once you implement scoring + rate limiting, tighten the commented-out
assertions to match the pass/fail gates.
gossipsub-score-sim/
Cargo.toml
README.md
WRITEUP.md ← fill this in (200–500 words)
.gitignore
src/
lib.rs ← crate root, re-exports all modules
main.rs ← binary entry point
cli.rs ← CLI argument definitions (clap)
sim.rs ← simulation orchestrator (topology, publishers, report)
p2p.rs ← per-node swarm loop (where validation happens)
behaviour.rs ← NetworkBehaviour wrapper (gossipsub config)
codec.rs ← WireMessage serde types + encode/decode
validator.rs ← message validation logic ← YOUR MAIN WORK HERE
metrics.rs ← per-node + per-peer counters
tests/
validator_prop.rs ← property-based tests for the validator
integration.rs ← golden-path end-to-end smoke test
cargo build fails with libp2p version errors:
Make sure you're on Rust 1.75+. Run rustup update stable. If you still see issues,
delete Cargo.lock and retry — the Cargo.toml specifies libp2p = "0.56" which
resolves to the latest 0.56.x patch.
"Address already in use" errors:
Each node listens on /ip4/127.0.0.1/tcp/0 (OS-assigned port). If you see bind
failures, check for leftover processes from a previous run: pkill -f gossipsub-score-sim.
macOS: "Too many open files":
The simulation opens many TCP connections. If you run with --peers 20+, you may hit
the default file descriptor limit. Fix with: ulimit -n 4096 before running.
Tests hang or time out:
The integration test runs for 5 seconds by design. If it hangs beyond ~15 seconds,
there's likely a deadlock in your swarm event loop. Check that every code path in the
tokio::select! loop either processes or drops the message — never blocks.
No messages accepted/rejected:
Make sure nodes have time to establish mesh connections. The skeleton sleeps 1 second
after dialing before publishing. If you reduce --duration-secs below 3, nodes may not
have enough time to subscribe and graft.
| Area | Weight | What we look for |
|---|---|---|
| Correctness | 30% | Every message gets exactly one verdict. Validation rules are sound. No panics under fuzz. |
| Resource bounding | 25% | Bounded dedupe sets, bounded per-peer state, no OOM under spam load. |
| Scoring & quarantine | 20% | Coherent model: penalties/rewards make sense, quarantine triggers at a reasonable threshold, and recovery is possible. |
| Simulation & report | 15% | Meets pass/fail gates. Report is clear and human-readable or machine-parseable. |
| Writeup & code quality | 10% | Clear reasoning about tradeoffs. Clean code. Good naming. |
Bonus tasks are additive — they can only help your score, never hurt it.
- Switch dedupe to LRU semantics or sliding TTL so hot IDs are less likely to be evicted.
- Add periodic TTL sweeps for the dedupe cache so old entries expire even without re-seen traffic.
- Replace the O(n) bucket eviction scan with a proper LRU (heap + linked list or a dedicated LRU cache crate).
- Create a non-zero type for configuration variables (e.g.,
dedupe_max_entries,rate_milli_per_sec) to enable more compile-time checks instead of runtime short-circuits.