BARE

Offline semantic ranker: scanner findings to Metasploit modules in a single Rust binary.

Features • Installation • Usage • Adapters • Corpus • Scope

BARE reads findings produced by nuclei, nmap, or Shodan adapters and ranks Metasploit modules by semantic similarity. The full pipeline runs offline. A BERT encoder, tokenizer, and 3,904 pre-encoded module descriptions are compiled into a ~101 MB binary. No Python, no PyTorch, no network, no package manager. Built for air-gapped networks, SCIFs, and restricted endpoints where installing a 5 GB ML stack is not an option.

Features

Single Rust binary, ~101 MB, fully self-contained
3,904 Metasploit modules baked in at compile time via include_bytes!: 2,647 exploits, 1,257 auxiliary
All-MiniLM-L6-v2 BERT encoder running natively in Rust, no Python runtime
Cosine-similarity ranking with configurable top-N and minimum-score thresholds
--no-match-threshold emits sentinel fields when no module description meaningfully resembles the finding
Three input adapters: nuclei JSONL, nmap XML, Shodan JSONL
Stable input and output JSON schemas (FORMAT.md, INPUT_FORMAT.md, OUTPUT_FORMAT.md)
Parity validation against Python sentence-transformers reference within f32 rounding error
Stdout carries only the JSON output, so piping is safe

Installation

Pre-built binary from the releases page:

curl -LO https://github.com/nuclide-research/BARE/releases/latest/download/bare-linux-x86_64
curl -LO https://github.com/nuclide-research/BARE/releases/latest/download/bare-linux-x86_64.sha256
sha256sum -c bare-linux-x86_64.sha256
chmod +x bare-linux-x86_64

Build from source (Rust 1.70 or later):

git clone https://github.com/nuclide-research/BARE
cd BARE
curl -L -o assets/model.safetensors \
  https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/model.safetensors
cargo build --release

The model weights (assets/model.safetensors, ~87 MB) are gitignored and must be fetched once before the first build. After that the binary is self-contained.

Usage

bare [OPTIONS] [INPUT_PATH]

  --top <N>                   Top matches per finding (default: 3)
  --min-score <FLOAT>         Suppress matches below this cosine similarity
                              (default: 0.0). 0.5 high-confidence, 0.4 moderate, 0.3 loose
  --no-match-threshold <FLOAT>
                              When top corpus score falls below this value,
                              clear matches and emit sentinel fields (default: 0.55)
  --encode                    Read stdin, print L2-normalized 384-dim vector to stdout
  --version                   Print version and exit

  INPUT_PATH                  Path to findings.json, or - / omitted to read stdin

Status messages go to stderr. The JSON output document is the only thing on stdout.

Adapters

Adapter	Input	Adapter command
nuclei	nuclei JSONL (`-j`)	`nuclei_to_bare.py`
nmap	nmap XML (`-oX`)	`nmap_to_bare.py`
shodan	Shodan JSONL bulk export	`shodan_to_bare.py`

Each adapter converts scanner output to the findings.json input schema (version 1). Run nmap with -sV to maximize the description surface.

nuclei -u https://target.com -j | python adapters/nuclei/nuclei_to_bare.py | bare
nmap -sV -oX - target.com | python adapters/nmap/nmap_to_bare.py | bare --top 5
cat results.json | python adapters/shodan/shodan_to_bare.py | bare --min-score 0.4

Input and output schema

Input (findings.json):

{
  "version": 1,
  "source": "nuclei",
  "findings": [
    {
      "id": "CVE-2023-22527",
      "title": "Atlassian Confluence SSTI RCE",
      "description": "...",
      "target": "https://example.com",
      "severity": "critical",
      "metadata": {}
    }
  ]
}

target, severity, and metadata are optional. id, title, and description are required and must be non-empty.

Output per finding:

Field	Type	Notes
`id`	string	echoed from input
`title`	string	echoed from input
`target`	string	echoed, omitted if absent
`severity`	string	echoed, omitted if absent
`matches`	array	ranked module matches
`no_high_confidence_match`	bool	set when top corpus score below threshold
`no_match_reason`	string	reason text when no match
`top_score_seen`	float	top raw score when no match

Each entry in matches:

Field	Notes
`rank`	1-based
`module`	Metasploit module path
`score`	cosine similarity (0.0 to 1.0)
`category`	first path segment of the module name

The top-level document also carries version, source, and a corpus object with size and sha256.

Example

bare --top 3 findings.json

{
  "version": 1,
  "source": "bare",
  "corpus": {
    "size": 3904,
    "sha256": "a3c1e..."
  },
  "findings": [
    {
      "id": "CVE-2023-22527",
      "title": "Atlassian Confluence SSTI RCE",
      "target": "https://example.com",
      "severity": "critical",
      "matches": [
        {
          "rank": 1,
          "module": "exploits/multi/http/atlassian_confluence_rce_cve_2023_22527",
          "score": 0.8322,
          "category": "exploits"
        },
        {
          "rank": 2,
          "module": "exploits/multi/http/atlassian_confluence_rce_cve_2024_21683",
          "score": 0.7472,
          "category": "exploits"
        }
      ]
    }
  ]
}

When --no-match-threshold fires, matches is empty and three sentinel fields appear: no_high_confidence_match: true, no_match_reason, and top_score_seen.

Corpus

The embedded corpus contains 3,904 Metasploit module descriptions: 2,647 exploits and 1,257 auxiliary. The corpus is baked in at compile time via include_bytes!. Rebuilding from a fresh Metasploit snapshot needs Python with sentence-transformers:

python fetch_modules.py   # fetch module .rb files from GitHub
python serialize.py       # encode to 384-dim vectors, write corpus.bin
cargo build --release     # embed corpus.bin in the binary

Parity validation

The Rust encoder must match the Python sentence-transformers reference to within f32 / f64 rounding error. bare --encode reads stdin and prints a space-separated L2-normalized 384-dimensional vector. CI compares this output element-wise against tools/encode_baseline.py with a 1e-5 floor (typically ~1e-7 in practice) and fails the build on any mismatch.

Scope

BARE ranks modules by semantic similarity. A high score means the module description resembles the finding description. It does not confirm exploitability, check version numbers, or replace a manual triage step. Scores near the corpus floor (below ~0.55) mean no meaningful module coverage for the finding class, not a false negative. The --no-match-threshold flag makes that explicit.

Our other projects

aimap — AI/ML infrastructure fingerprint scanner
scanner — fast banner stage for population sweeps
tiptoe — quiet, congestion-controlled scanner for sensitive targets
menlohunt — zero-knowledge GCP perimeter scanner
recongraph — typed provenance graph for multi-source recon

License

MIT or Apache 2.0, at your option (standard Rust dual license). The embedded model weights (sentence-transformers/all-MiniLM-L6-v2) are Apache 2.0. Metasploit module descriptions used to build the corpus are BSD 3-Clause (Rapid7). Part of the NuClide toolchain. Contact: nuclide-research.com

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
adapters		adapters
assets		assets
schemas		schemas
src		src
tools		tools
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
FORMAT.md		FORMAT.md
INPUT_FORMAT.md		INPUT_FORMAT.md
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
OUTPUT_FORMAT.md		OUTPUT_FORMAT.md
PRIOR_ART.md		PRIOR_ART.md
README.md		README.md
baseline.py		baseline.py
corpus.bin		corpus.bin
fetch_modules.py		fetch_modules.py
serialize.py		serialize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BARE

Offline semantic ranker: scanner findings to Metasploit modules in a single Rust binary.

Features

Installation

Usage

Adapters

Input and output schema

Example

Corpus

Parity validation

Scope

Our other projects

License

About

Licenses found

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BARE

Offline semantic ranker: scanner findings to Metasploit modules in a single Rust binary.

Features

Installation

Usage

Adapters

Input and output schema

Example

Corpus

Parity validation

Scope

Our other projects

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages