recongraph

Seed-polymorphic reconnaissance engine: any seed in, a typed provenance graph out.

Features • Installation • Usage • Seed Types • Exposure Classes • Scope

recongraph accepts six seed types (IP, CIDR, Domain, ASN, CertFP, BannerString) and runs a fixed-point iteration of probes against them. Every finding becomes a typed node with a provenance chain back to the original seed. Passive sources run first. Active non-intrusive probes fire only where passive signal left a node ambiguous. The engine halts when the queue drains, the budget cap is hit, or an iteration adds zero new nodes. Once stable, every Service node receives an exposure label from a rule set that records which rule fired.

The orchestration is probe-agnostic. The crt.sh certificate-transparency lookup is the one fully-implemented real probe in the public tree. Stubs in probes.py and probes_real/ mark the registration points for the rest.

Features

Six seed types: IP, CIDR, Domain, ASN, CertFP, BannerString
Typed provenance graph: every node carries the chain back to the seed that produced it
Fixed-point engine: passive saturation first, active probes gated on remaining budget
Hard budget caps: wallclock, probe cost, unique hosts, requests per /24, requests per ASN
Five exposure classes: public_intended, public_accidental, mgmt_exposed, legacy_drift, unknown
Drift detection between runs via DRIFT_FROM edges and a diff() method
Sandbox-MITM detection downgrades L7 conclusions to OPAQUE when intercepting environments are detected
Cloud range classifier for GCP, AWS, Cloudflare plus rDNS patterns for nine cloud providers
/24 and /20 neighborhood homogeneity sweep
Python 3.8 or later, standard library only, no external dependencies

Installation

git clone https://github.com/nuclide-research/recongraph
cd recongraph

Python 3.8 or later. Standard library only. All network I/O lives inside probe implementations. The engine, graph, budget, and classification logic are pure.

Usage

from recongraph import Engine, Seed, SeedType

engine = Engine()
graph = engine.run([Seed(SeedType.IP, "192.0.2.10")])

print(graph.to_json())

Smoke test, no network:

python smoke_test.py

Reference pipeline, clean network environment:

python upgraded_runs.py

Seed types

SeedType	Example value
`IP`	`192.0.2.10`
`CIDR`	`192.0.2.0/24`
`DOMAIN`	`example.com`
`ASN`	`AS15169`
`CERT_FP`	sha256 of DER
`BANNER`	`Server: nginx/1.18`

Node and edge types

Nodes: HOST, SERVICE, CERT, DOMAIN, NETBLOCK, ORG, ASN.

Edges: OBSERVED_ON, ISSUED_FOR, RESOLVES_TO, ANNOUNCED_BY, CO_HOSTED_WITH, SHARES_CERT_WITH, BELONGS_TO, DRIFT_FROM.

Exposure classes

Every Service node receives one label after the graph stabilizes. Rules are ordered, first match wins. legacy_drift fires before mgmt_exposed.

Class	Meaning
`public_intended`	http, https, dns, smtp, submission, imaps, pop3s
`public_accidental`	staging / dev / test subdomains, `.git`, `.env`, `/backup`, `/phpinfo`
`mgmt_exposed`	ssh, rdp, vnc, ipmi, mysql, postgres, mongodb, redis, elasticsearch, kubelet, etcd, docker-api, ldap
`legacy_drift`	finger, telnet, gopher, tftp, rsh, rlogin, chargen, qi
`unknown`	no rule matched

Budget defaults

Cap	Default
Wallclock	300 s
Probe cost	1000 units
Unique hosts	500
Requests per /24	30
Requests per ASN	100

Graph output shape

{
  "created_at": 1717430400.0,
  "nodes": [
    {
      "type": "host",
      "value": "192.0.2.10",
      "attrs": {},
      "provenance": [["seed-id"]],
      "first_seen": 1717430400.0,
      "last_seen": 1717430401.2,
      "exposure": null,
      "id": "a1b2c3d4e5f60001"
    }
  ],
  "edges": [
    {
      "src": "a1b2c3d4e5f60001",
      "dst": "b2c3d4e5f6a70002",
      "type": "resolves_to",
      "attrs": {},
      "first_seen": 1717430401.0
    }
  ]
}

Drift detection

Two runs can be compared. emit_drift_edges adds DRIFT_FROM edges where attributes changed. diff returns added and removed nodes and edges.

snapshot = Graph.from_dict(json.loads(old_json))
current.emit_drift_edges(snapshot)
diff = current.diff(snapshot)

Additional modules

Module	Purpose
`cloud_ranges.py`	Classifier over GCP, AWS, Cloudflare published range files. rDNS patterns for nine cloud providers. Weekly on-disk cache.
`l7_fingerprint.py`	Raw HTTP probe ladder, canonical error-page signature library with fail-closed matching, HTTP/2 cleartext detection.
`neighbors.py`	/24 and /20 homogeneity sweep with verdict classification (highly homogeneous means shared edge pool, highly heterogeneous means single tenant per IP).
`tenant_model.py`	`TenantModel` taxonomy, `IdentificationConfidence` levels, `EnvironmentalConstraints` for recording what the environment prevented observing.
`sandbox_detect.py`	Startup check: identical payloads to unrelated reference IPs. Identical response shapes mean the environment is intercepting. Downgrades L7-derived tenant conclusions to OPAQUE.

upgraded_runs.py is the reference pipeline that wires everything together.

Adding a probe

from recongraph import Seed, SeedType, Finding, ProbeMode, Probe, Node, NodeType

def my_probe(seed: Seed, budget) -> Finding:
    if seed.type != SeedType.IP:
        return Finding(source="my-probe", mode=ProbeMode.PASSIVE, confidence=0)
    return Finding(
        source="my-probe",
        mode=ProbeMode.PASSIVE,
        confidence=0.8,
        nodes=[Node(type=NodeType.HOST, value=seed.value, attrs={})],
        edges=[],
    )

registry.register(Probe(
    name="my_probe",
    accepts=(SeedType.IP,),
    mode=ProbeMode.PASSIVE,
    fn=my_probe,
    cost=2,
))

Adding an exposure rule

def rule_my_thing(node, graph):
    if node.type != NodeType.SERVICE:
        return None
    if some_condition(node):
        return (ExposureClass.MGMT_EXPOSED, "my_reason")
    return None

classify_graph(graph, rules=[rule_my_thing] + DEFAULT_RULES)

Scope

recongraph is not a scanner. It does not sweep ranges, does not send exploit traffic, and does not emit findings it cannot explain. The one real probe (crt.sh) is passive and read-only. Every other probe in the default registry is a stub waiting for a real implementation. The orchestration runs regardless. What fires depends entirely on which probes have real implementations registered.

Our other projects

aimap — AI/ML infrastructure fingerprint scanner
scanner — fast banner stage for population sweeps
tiptoe — quiet, congestion-controlled scanner for sensitive targets
nu-recon — single-host passive recon, JSON report out
BARE — semantic exploit-module ranking over scanner findings

License

MIT. Part of the NuClide toolchain. Contact: nuclide-research.com

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude/commands		.claude/commands
recongraph		recongraph
runs		runs
seed_runs		seed_runs
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
live_test.py		live_test.py
smoke_test.py		smoke_test.py
test_ct_probe.py		test_ct_probe.py
upgraded_runs.py		upgraded_runs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

recongraph

Seed-polymorphic reconnaissance engine: any seed in, a typed provenance graph out.

Features

Installation

Usage

Seed types

Node and edge types

Exposure classes

Budget defaults

Graph output shape

Drift detection

Additional modules

Adding a probe

Adding an exposure rule

Scope

Our other projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

recongraph

Seed-polymorphic reconnaissance engine: any seed in, a typed provenance graph out.

Features

Installation

Usage

Seed types

Node and edge types

Exposure classes

Budget defaults

Graph output shape

Drift detection

Additional modules

Adding a probe

Adding an exposure rule

Scope

Our other projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages