Features • Installation • Usage • Adapters • Corpus • Scope
BARE reads findings produced by nuclei, nmap, or Shodan adapters and ranks Metasploit modules by semantic similarity. The full pipeline runs offline. A BERT encoder, tokenizer, and 3,904 pre-encoded module descriptions are compiled into a ~101 MB binary. No Python, no PyTorch, no network, no package manager. Built for air-gapped networks, SCIFs, and restricted endpoints where installing a 5 GB ML stack is not an option.
- Single Rust binary, ~101 MB, fully self-contained
- 3,904 Metasploit modules baked in at compile time via
include_bytes!: 2,647 exploits, 1,257 auxiliary - All-MiniLM-L6-v2 BERT encoder running natively in Rust, no Python runtime
- Cosine-similarity ranking with configurable top-N and minimum-score thresholds
--no-match-thresholdemits sentinel fields when no module description meaningfully resembles the finding- Three input adapters: nuclei JSONL, nmap XML, Shodan JSONL
- Stable input and output JSON schemas (FORMAT.md, INPUT_FORMAT.md, OUTPUT_FORMAT.md)
- Parity validation against Python
sentence-transformersreference within f32 rounding error - Stdout carries only the JSON output, so piping is safe
Pre-built binary from the releases page:
curl -LO https://github.com/nuclide-research/BARE/releases/latest/download/bare-linux-x86_64
curl -LO https://github.com/nuclide-research/BARE/releases/latest/download/bare-linux-x86_64.sha256
sha256sum -c bare-linux-x86_64.sha256
chmod +x bare-linux-x86_64Build from source (Rust 1.70 or later):
git clone https://github.com/nuclide-research/BARE
cd BARE
curl -L -o assets/model.safetensors \
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/model.safetensors
cargo build --releaseThe model weights (assets/model.safetensors, ~87 MB) are gitignored and must be fetched once before the first build. After that the binary is self-contained.
bare [OPTIONS] [INPUT_PATH]
--top <N> Top matches per finding (default: 3)
--min-score <FLOAT> Suppress matches below this cosine similarity
(default: 0.0). 0.5 high-confidence, 0.4 moderate, 0.3 loose
--no-match-threshold <FLOAT>
When top corpus score falls below this value,
clear matches and emit sentinel fields (default: 0.55)
--encode Read stdin, print L2-normalized 384-dim vector to stdout
--version Print version and exit
INPUT_PATH Path to findings.json, or - / omitted to read stdinStatus messages go to stderr. The JSON output document is the only thing on stdout.
| Adapter | Input | Adapter command |
|---|---|---|
| nuclei | nuclei JSONL (-j) |
nuclei_to_bare.py |
| nmap | nmap XML (-oX) |
nmap_to_bare.py |
| shodan | Shodan JSONL bulk export | shodan_to_bare.py |
Each adapter converts scanner output to the findings.json input schema (version 1). Run nmap with -sV to maximize the description surface.
nuclei -u https://target.com -j | python adapters/nuclei/nuclei_to_bare.py | bare
nmap -sV -oX - target.com | python adapters/nmap/nmap_to_bare.py | bare --top 5
cat results.json | python adapters/shodan/shodan_to_bare.py | bare --min-score 0.4Input (findings.json):
{
"version": 1,
"source": "nuclei",
"findings": [
{
"id": "CVE-2023-22527",
"title": "Atlassian Confluence SSTI RCE",
"description": "...",
"target": "https://example.com",
"severity": "critical",
"metadata": {}
}
]
}target, severity, and metadata are optional. id, title, and description are required and must be non-empty.
Output per finding:
| Field | Type | Notes |
|---|---|---|
id |
string | echoed from input |
title |
string | echoed from input |
target |
string | echoed, omitted if absent |
severity |
string | echoed, omitted if absent |
matches |
array | ranked module matches |
no_high_confidence_match |
bool | set when top corpus score below threshold |
no_match_reason |
string | reason text when no match |
top_score_seen |
float | top raw score when no match |
Each entry in matches:
| Field | Notes |
|---|---|
rank |
1-based |
module |
Metasploit module path |
score |
cosine similarity (0.0 to 1.0) |
category |
first path segment of the module name |
The top-level document also carries version, source, and a corpus object with size and sha256.
bare --top 3 findings.json{
"version": 1,
"source": "bare",
"corpus": {
"size": 3904,
"sha256": "a3c1e..."
},
"findings": [
{
"id": "CVE-2023-22527",
"title": "Atlassian Confluence SSTI RCE",
"target": "https://example.com",
"severity": "critical",
"matches": [
{
"rank": 1,
"module": "exploits/multi/http/atlassian_confluence_rce_cve_2023_22527",
"score": 0.8322,
"category": "exploits"
},
{
"rank": 2,
"module": "exploits/multi/http/atlassian_confluence_rce_cve_2024_21683",
"score": 0.7472,
"category": "exploits"
}
]
}
]
}When --no-match-threshold fires, matches is empty and three sentinel fields appear: no_high_confidence_match: true, no_match_reason, and top_score_seen.
The embedded corpus contains 3,904 Metasploit module descriptions: 2,647 exploits and 1,257 auxiliary. The corpus is baked in at compile time via include_bytes!. Rebuilding from a fresh Metasploit snapshot needs Python with sentence-transformers:
python fetch_modules.py # fetch module .rb files from GitHub
python serialize.py # encode to 384-dim vectors, write corpus.bin
cargo build --release # embed corpus.bin in the binaryThe Rust encoder must match the Python sentence-transformers reference to within f32 / f64 rounding error. bare --encode reads stdin and prints a space-separated L2-normalized 384-dimensional vector. CI compares this output element-wise against tools/encode_baseline.py with a 1e-5 floor (typically ~1e-7 in practice) and fails the build on any mismatch.
BARE ranks modules by semantic similarity. A high score means the module description resembles the finding description. It does not confirm exploitability, check version numbers, or replace a manual triage step. Scores near the corpus floor (below ~0.55) mean no meaningful module coverage for the finding class, not a false negative. The --no-match-threshold flag makes that explicit.
- aimap — AI/ML infrastructure fingerprint scanner
- scanner — fast banner stage for population sweeps
- tiptoe — quiet, congestion-controlled scanner for sensitive targets
- menlohunt — zero-knowledge GCP perimeter scanner
- recongraph — typed provenance graph for multi-source recon
MIT or Apache 2.0, at your option (standard Rust dual license). The embedded model weights (sentence-transformers/all-MiniLM-L6-v2) are Apache 2.0. Metasploit module descriptions used to build the corpus are BSD 3-Clause (Rapid7). Part of the NuClide toolchain. Contact: nuclide-research.com