Features • Installation • Usage • Seed Types • Exposure Classes • Scope
recongraph accepts six seed types (IP, CIDR, Domain, ASN, CertFP, BannerString) and runs a fixed-point iteration of probes against them. Every finding becomes a typed node with a provenance chain back to the original seed. Passive sources run first. Active non-intrusive probes fire only where passive signal left a node ambiguous. The engine halts when the queue drains, the budget cap is hit, or an iteration adds zero new nodes. Once stable, every Service node receives an exposure label from a rule set that records which rule fired.
The orchestration is probe-agnostic. The crt.sh certificate-transparency lookup is the one fully-implemented real probe in the public tree. Stubs in probes.py and probes_real/ mark the registration points for the rest.
- Six seed types: IP, CIDR, Domain, ASN, CertFP, BannerString
- Typed provenance graph: every node carries the chain back to the seed that produced it
- Fixed-point engine: passive saturation first, active probes gated on remaining budget
- Hard budget caps: wallclock, probe cost, unique hosts, requests per /24, requests per ASN
- Five exposure classes:
public_intended,public_accidental,mgmt_exposed,legacy_drift,unknown - Drift detection between runs via
DRIFT_FROMedges and adiff()method - Sandbox-MITM detection downgrades L7 conclusions to OPAQUE when intercepting environments are detected
- Cloud range classifier for GCP, AWS, Cloudflare plus rDNS patterns for nine cloud providers
- /24 and /20 neighborhood homogeneity sweep
- Python 3.8 or later, standard library only, no external dependencies
git clone https://github.com/nuclide-research/recongraph
cd recongraphPython 3.8 or later. Standard library only. All network I/O lives inside probe implementations. The engine, graph, budget, and classification logic are pure.
from recongraph import Engine, Seed, SeedType
engine = Engine()
graph = engine.run([Seed(SeedType.IP, "192.0.2.10")])
print(graph.to_json())Smoke test, no network:
python smoke_test.pyReference pipeline, clean network environment:
python upgraded_runs.py| SeedType | Example value |
|---|---|
IP |
192.0.2.10 |
CIDR |
192.0.2.0/24 |
DOMAIN |
example.com |
ASN |
AS15169 |
CERT_FP |
sha256 of DER |
BANNER |
Server: nginx/1.18 |
Nodes: HOST, SERVICE, CERT, DOMAIN, NETBLOCK, ORG, ASN.
Edges: OBSERVED_ON, ISSUED_FOR, RESOLVES_TO, ANNOUNCED_BY, CO_HOSTED_WITH, SHARES_CERT_WITH, BELONGS_TO, DRIFT_FROM.
Every Service node receives one label after the graph stabilizes. Rules are ordered, first match wins. legacy_drift fires before mgmt_exposed.
| Class | Meaning |
|---|---|
public_intended |
http, https, dns, smtp, submission, imaps, pop3s |
public_accidental |
staging / dev / test subdomains, .git, .env, /backup, /phpinfo |
mgmt_exposed |
ssh, rdp, vnc, ipmi, mysql, postgres, mongodb, redis, elasticsearch, kubelet, etcd, docker-api, ldap |
legacy_drift |
finger, telnet, gopher, tftp, rsh, rlogin, chargen, qi |
unknown |
no rule matched |
| Cap | Default |
|---|---|
| Wallclock | 300 s |
| Probe cost | 1000 units |
| Unique hosts | 500 |
| Requests per /24 | 30 |
| Requests per ASN | 100 |
{
"created_at": 1717430400.0,
"nodes": [
{
"type": "host",
"value": "192.0.2.10",
"attrs": {},
"provenance": [["seed-id"]],
"first_seen": 1717430400.0,
"last_seen": 1717430401.2,
"exposure": null,
"id": "a1b2c3d4e5f60001"
}
],
"edges": [
{
"src": "a1b2c3d4e5f60001",
"dst": "b2c3d4e5f6a70002",
"type": "resolves_to",
"attrs": {},
"first_seen": 1717430401.0
}
]
}Two runs can be compared. emit_drift_edges adds DRIFT_FROM edges where attributes changed. diff returns added and removed nodes and edges.
snapshot = Graph.from_dict(json.loads(old_json))
current.emit_drift_edges(snapshot)
diff = current.diff(snapshot)| Module | Purpose |
|---|---|
cloud_ranges.py |
Classifier over GCP, AWS, Cloudflare published range files. rDNS patterns for nine cloud providers. Weekly on-disk cache. |
l7_fingerprint.py |
Raw HTTP probe ladder, canonical error-page signature library with fail-closed matching, HTTP/2 cleartext detection. |
neighbors.py |
/24 and /20 homogeneity sweep with verdict classification (highly homogeneous means shared edge pool, highly heterogeneous means single tenant per IP). |
tenant_model.py |
TenantModel taxonomy, IdentificationConfidence levels, EnvironmentalConstraints for recording what the environment prevented observing. |
sandbox_detect.py |
Startup check: identical payloads to unrelated reference IPs. Identical response shapes mean the environment is intercepting. Downgrades L7-derived tenant conclusions to OPAQUE. |
upgraded_runs.py is the reference pipeline that wires everything together.
from recongraph import Seed, SeedType, Finding, ProbeMode, Probe, Node, NodeType
def my_probe(seed: Seed, budget) -> Finding:
if seed.type != SeedType.IP:
return Finding(source="my-probe", mode=ProbeMode.PASSIVE, confidence=0)
return Finding(
source="my-probe",
mode=ProbeMode.PASSIVE,
confidence=0.8,
nodes=[Node(type=NodeType.HOST, value=seed.value, attrs={})],
edges=[],
)
registry.register(Probe(
name="my_probe",
accepts=(SeedType.IP,),
mode=ProbeMode.PASSIVE,
fn=my_probe,
cost=2,
))def rule_my_thing(node, graph):
if node.type != NodeType.SERVICE:
return None
if some_condition(node):
return (ExposureClass.MGMT_EXPOSED, "my_reason")
return None
classify_graph(graph, rules=[rule_my_thing] + DEFAULT_RULES)recongraph is not a scanner. It does not sweep ranges, does not send exploit traffic, and does not emit findings it cannot explain. The one real probe (crt.sh) is passive and read-only. Every other probe in the default registry is a stub waiting for a real implementation. The orchestration runs regardless. What fires depends entirely on which probes have real implementations registered.
- aimap — AI/ML infrastructure fingerprint scanner
- scanner — fast banner stage for population sweeps
- tiptoe — quiet, congestion-controlled scanner for sensitive targets
- nu-recon — single-host passive recon, JSON report out
- BARE — semantic exploit-module ranking over scanner findings
MIT. Part of the NuClide toolchain. Contact: nuclide-research.com