Skip to content

nuclide-research/recongraph

Repository files navigation

recongraph

Seed-polymorphic reconnaissance engine: any seed in, a typed provenance graph out.

release license python NuClide

FeaturesInstallationUsageSeed TypesExposure ClassesScope


recongraph accepts six seed types (IP, CIDR, Domain, ASN, CertFP, BannerString) and runs a fixed-point iteration of probes against them. Every finding becomes a typed node with a provenance chain back to the original seed. Passive sources run first. Active non-intrusive probes fire only where passive signal left a node ambiguous. The engine halts when the queue drains, the budget cap is hit, or an iteration adds zero new nodes. Once stable, every Service node receives an exposure label from a rule set that records which rule fired.

The orchestration is probe-agnostic. The crt.sh certificate-transparency lookup is the one fully-implemented real probe in the public tree. Stubs in probes.py and probes_real/ mark the registration points for the rest.

Features

  • Six seed types: IP, CIDR, Domain, ASN, CertFP, BannerString
  • Typed provenance graph: every node carries the chain back to the seed that produced it
  • Fixed-point engine: passive saturation first, active probes gated on remaining budget
  • Hard budget caps: wallclock, probe cost, unique hosts, requests per /24, requests per ASN
  • Five exposure classes: public_intended, public_accidental, mgmt_exposed, legacy_drift, unknown
  • Drift detection between runs via DRIFT_FROM edges and a diff() method
  • Sandbox-MITM detection downgrades L7 conclusions to OPAQUE when intercepting environments are detected
  • Cloud range classifier for GCP, AWS, Cloudflare plus rDNS patterns for nine cloud providers
  • /24 and /20 neighborhood homogeneity sweep
  • Python 3.8 or later, standard library only, no external dependencies

Installation

git clone https://github.com/nuclide-research/recongraph
cd recongraph

Python 3.8 or later. Standard library only. All network I/O lives inside probe implementations. The engine, graph, budget, and classification logic are pure.

Usage

from recongraph import Engine, Seed, SeedType

engine = Engine()
graph = engine.run([Seed(SeedType.IP, "192.0.2.10")])

print(graph.to_json())

Smoke test, no network:

python smoke_test.py

Reference pipeline, clean network environment:

python upgraded_runs.py

Seed types

SeedType Example value
IP 192.0.2.10
CIDR 192.0.2.0/24
DOMAIN example.com
ASN AS15169
CERT_FP sha256 of DER
BANNER Server: nginx/1.18

Node and edge types

Nodes: HOST, SERVICE, CERT, DOMAIN, NETBLOCK, ORG, ASN.

Edges: OBSERVED_ON, ISSUED_FOR, RESOLVES_TO, ANNOUNCED_BY, CO_HOSTED_WITH, SHARES_CERT_WITH, BELONGS_TO, DRIFT_FROM.

Exposure classes

Every Service node receives one label after the graph stabilizes. Rules are ordered, first match wins. legacy_drift fires before mgmt_exposed.

Class Meaning
public_intended http, https, dns, smtp, submission, imaps, pop3s
public_accidental staging / dev / test subdomains, .git, .env, /backup, /phpinfo
mgmt_exposed ssh, rdp, vnc, ipmi, mysql, postgres, mongodb, redis, elasticsearch, kubelet, etcd, docker-api, ldap
legacy_drift finger, telnet, gopher, tftp, rsh, rlogin, chargen, qi
unknown no rule matched

Budget defaults

Cap Default
Wallclock 300 s
Probe cost 1000 units
Unique hosts 500
Requests per /24 30
Requests per ASN 100

Graph output shape

{
  "created_at": 1717430400.0,
  "nodes": [
    {
      "type": "host",
      "value": "192.0.2.10",
      "attrs": {},
      "provenance": [["seed-id"]],
      "first_seen": 1717430400.0,
      "last_seen": 1717430401.2,
      "exposure": null,
      "id": "a1b2c3d4e5f60001"
    }
  ],
  "edges": [
    {
      "src": "a1b2c3d4e5f60001",
      "dst": "b2c3d4e5f6a70002",
      "type": "resolves_to",
      "attrs": {},
      "first_seen": 1717430401.0
    }
  ]
}

Drift detection

Two runs can be compared. emit_drift_edges adds DRIFT_FROM edges where attributes changed. diff returns added and removed nodes and edges.

snapshot = Graph.from_dict(json.loads(old_json))
current.emit_drift_edges(snapshot)
diff = current.diff(snapshot)

Additional modules

Module Purpose
cloud_ranges.py Classifier over GCP, AWS, Cloudflare published range files. rDNS patterns for nine cloud providers. Weekly on-disk cache.
l7_fingerprint.py Raw HTTP probe ladder, canonical error-page signature library with fail-closed matching, HTTP/2 cleartext detection.
neighbors.py /24 and /20 homogeneity sweep with verdict classification (highly homogeneous means shared edge pool, highly heterogeneous means single tenant per IP).
tenant_model.py TenantModel taxonomy, IdentificationConfidence levels, EnvironmentalConstraints for recording what the environment prevented observing.
sandbox_detect.py Startup check: identical payloads to unrelated reference IPs. Identical response shapes mean the environment is intercepting. Downgrades L7-derived tenant conclusions to OPAQUE.

upgraded_runs.py is the reference pipeline that wires everything together.

Adding a probe

from recongraph import Seed, SeedType, Finding, ProbeMode, Probe, Node, NodeType

def my_probe(seed: Seed, budget) -> Finding:
    if seed.type != SeedType.IP:
        return Finding(source="my-probe", mode=ProbeMode.PASSIVE, confidence=0)
    return Finding(
        source="my-probe",
        mode=ProbeMode.PASSIVE,
        confidence=0.8,
        nodes=[Node(type=NodeType.HOST, value=seed.value, attrs={})],
        edges=[],
    )

registry.register(Probe(
    name="my_probe",
    accepts=(SeedType.IP,),
    mode=ProbeMode.PASSIVE,
    fn=my_probe,
    cost=2,
))

Adding an exposure rule

def rule_my_thing(node, graph):
    if node.type != NodeType.SERVICE:
        return None
    if some_condition(node):
        return (ExposureClass.MGMT_EXPOSED, "my_reason")
    return None

classify_graph(graph, rules=[rule_my_thing] + DEFAULT_RULES)

Scope

recongraph is not a scanner. It does not sweep ranges, does not send exploit traffic, and does not emit findings it cannot explain. The one real probe (crt.sh) is passive and read-only. Every other probe in the default registry is a stub waiting for a real implementation. The orchestration runs regardless. What fires depends entirely on which probes have real implementations registered.

Our other projects

  • aimap — AI/ML infrastructure fingerprint scanner
  • scanner — fast banner stage for population sweeps
  • tiptoe — quiet, congestion-controlled scanner for sensitive targets
  • nu-recon — single-host passive recon, JSON report out
  • BARE — semantic exploit-module ranking over scanner findings

License

MIT. Part of the NuClide toolchain. Contact: nuclide-research.com

Releases

No releases published

Packages

 
 
 

Contributors

Languages