Skip to content

MoriartyPuth/NETH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

NETH (αž“αŸαžαŸ’αžš) β€” The Digital Watchful Eye

logo

An automated defensive gateway that protects the Khmer community from digital financial fraud. NETH intercepts, parses, and classifies three threat vectors:

Demo

▢️ Watch the demo β€” scanning a scam QR, a phishing message, and the Telegram bot.

# Threat Engine
01 KHQR payload tampering / identity-routing mismatch khqr_core + bakong_verify
02 Physical QR sticker overlays on merchant placards vision_overlay
03 Localized Khmer-language phishing / social engineering nlp_khmer

Every check returns one of three risk levels: βœ… Safe (0), ⚠️ Suspicious (1), β›” Blocked (2).

Architecture

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  Front ends:  Web UI  Β·  Telegram bot  Β·  API β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚   scoring.NethGatewayβ”‚  ← max-severity aggregation
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β–Ό              β–Ό          β–Ό           β–Ό               β–Ό
   vision_overlay   khqr_core  bakong_verify  nlp_khmer    (your model)
   multi-QR /       TLV walker  account-id β†’  Khmer phishing
   extract          + CRC-16    holder name   heuristic/XLM-R

Why identity, not checksum, is the core defense

A real overlay scam uses a structurally perfect KHQR pointing at the attacker's own account β€” valid CRC, valid TLV. So khqr_core treats CRC/TLV as a validity pre-filter, and the decisive check is identity_match: an offline cross-field check comparing the displayed name (Tag 59) against the account routing (Tag 29/30 bank code). If a QR labeled "ABA" routes to an ACLEDA account, that's the classic overlay pattern β†’ blocked.

Note: the public Bakong API does not resolve an account id to a holder name (privacy/anti-enumeration), so identity defense is the cross-field routing check above β€” not a name lookup. bakong_verify is reserved for account-existence / transaction verification once a token is configured.

Quick start

pip install -r requirements.txt        # core deps
uvicorn neth.api:app --reload          # then open http://127.0.0.1:8000/
pytest -q                              # run the offline smoke tests

The gateway is useful immediately with no model download or API key β€” the NLP engine ships a working Khmer heuristic baseline, and KHQR/vision run offline.

Telegram bot

Use the bot (for everyone)

  1. Open the bot in Telegram: @neth_watch_bot
  2. Press Start.
  3. Send it either:
    • πŸ“· a photo of a KHQR to check it, or
    • πŸ“ a forwarded message or link you're unsure about.
  4. It replies in Khmer with βœ… Safe / ⚠️ Suspicious / β›” Blocked and the reason.

Always confirm the recipient's name in your banking app before paying β€” NETH is an advisory aid, not a guarantee.

Run your own bot

  1. In Telegram, message @BotFather β†’ /newbot β†’ pick a name + username β†’ copy the token it gives you.
  2. Provide the token to NETH (any one of these β€” never commit it to git):
    • File: create .telegram_token in the project root containing just the token, or
    • Env var: setx NETH_TELEGRAM_TOKEN "<token>" (Windows; then open a new terminal), or
    • Argument: python -m neth.bot <token>
  3. Install and run:
    pip install -r requirements.txt
    python -m neth.bot
    When you see Telegram bot running…, message your bot and press Start.

Notes:

  • Only one instance may poll a token at a time β€” don't run it locally and on a server with the same token (Telegram returns a "Conflict" error).
  • The token is a secret. .telegram_token, .env, and *.token are git-ignored.
  • For always-on hosting (so the bot runs without your PC), see DEPLOY.md.

Optional integrations (env vars)

Variable Enables
NETH_BAKONG_TOKEN live Bakong account-name verification (the strong overlay defense)
NETH_BAKONG_BASE override Bakong API base URL
NETH_NLP_MODEL path to a fine-tuned XLM-RoBERTa Khmer classifier (else heuristic)
NETH_URL_ONLINE 1 to enable shortener expansion + URL threat feeds
NETH_URLHAUS_KEY URLhaus (abuse.ch) auth key for known-malicious-URL lookups
NETH_GSB_KEY Google Safe Browsing API key
NETH_TELEGRAM_TOKEN run the Telegram bot: python -m neth.bot

URL reputation

url_reputation.py scores links in layers. Offline (always on): correct canonical bank-domain matching (e.g. ABA = ababank.com, not aba.com), brand-off-domain lookalikes, punycode/homoglyph hosts, IP-literal hosts, @ userinfo tricks, and shortener detection. Online (opt-in via NETH_URL_ONLINE=1): shortener expansion plus URLhaus / Google Safe Browsing feeds β€” a feed hit is decisive. Offline gives a usable prior; the feeds make accuracy measurable.

Bank coverage. The brand-lookalike list lives in data/bank_domains.yaml (~30 Cambodian + international brands) and is loaded at runtime β€” add a bank by editing YAML, no code change. Brand-agnostic checks (feeds, IP/punycode/@/shortener/TLD) protect every bank, listed or not; the lookalike rule only covers listed brands. ⚠️ A wrong domain in the YAML flags the real bank β€” verify before adding.

API

GET  /health
POST /api/analyze/text   {"text": "..."}        β†’ verdict
POST /api/analyze/khqr   {"payload": "000201…"} β†’ verdict
POST /api/analyze/image  multipart file=<photo> β†’ verdict
POST /api/feedback       {input_type,input_excerpt,predicted_score,correct_label,note}
GET  /api/feedback/stats β†’ counts + scam-missed-as-safe

Responses are Khmer-first: every verdict carries summary_km and each signal a reason_km, with English kept alongside for logs. Inputs are size- capped and the URL fetcher is SSRF-guarded (blocks internal/metadata IPs).

Benchmarking accuracy

python scripts/fetch_eval_data.py --n 500   # download URLhaus + Tranco -> eval_data/
python bench_urls.py --phish eval_data/phish.txt --benign eval_data/benign.txt --sweep
python bench_gateway.py                      # whole-gateway: text + KHQR modalities

bench_urls.py measures the URL engine; bench_gateway.py measures the full pipeline across text and KHQR. Bundled samples are tiny (validate logic, not a real-world score) β€” use fetch_eval_data.py for a meaningful number.

Feedback loop

Users can report wrong verdicts (web buttons / POST /api/feedback). Corrections are stored in data/feedback.db (SQLite, git-ignored) as truncated excerpts β€” not full payloads. Export for training with FeedbackStore.export_jsonl(). This is how NETH gathers ground truth and the labeled corpus to train the Khmer NLP.

Project layout

neth/
β”œβ”€β”€ khqr_core.py      EMVCo/KHQR TLV parser + CRC-16 (validity pre-filter)
β”œβ”€β”€ identity_match.py offline Tag 59 ↔ Tag 29/30 routing mismatch (overlay defense)
β”œβ”€β”€ bakong_verify.py  account-existence / transaction verification (needs token)
β”œβ”€β”€ nlp_khmer.py      Khmer phishing detector (heuristic + optional transformer)
β”œβ”€β”€ url_reputation.py layered URL scoring (SSRF-guarded) + threat feeds
β”œβ”€β”€ vision_overlay.py QR extraction + multi-QR overlay detection
β”œβ”€β”€ scoring.py        signal aggregation β†’ final verdict
β”œβ”€β”€ i18n.py           Khmer localization of verdicts
β”œβ”€β”€ feedback.py       SQLite feedback/ground-truth store
β”œβ”€β”€ api.py            FastAPI server (JSON API + web UI)
β”œβ”€β”€ bot.py            Telegram front end
└── web/              static UI (index.html, style.css, app.js)
bench_urls.py Β· bench_gateway.py Β· scripts/fetch_eval_data.py   benchmarking
data/bank_domains.yaml Β· data/bank_codes.yaml                   editable brand data
tests/test_engines.py                                           20 offline tests

Limitations (read this)

NETH is an advisory aid, not an authority. It reduces risk on the common scams; it does not guarantee a QR or link is safe. Always confirm the recipient name in your banking app before paying. Known gaps:

  • Identity/routing check has limited coverage. It only flags a name↔bank mismatch for banks whose codes are in data/bank_codes.yaml (currently 4: ABA, ACLEDA, Canadia, Wing). For any other bank it says "couldn't verify routing" β€” not "safe." It also cannot detect a scammer who pastes a QR for their own account at the same bank as the real merchant.
  • No account-name verification. The public Bakong API does not expose account β†’ holder-name lookup, so NETH cannot confirm who an account belongs to. Only your banking app can.
  • Khmer NLP is an unvalidated heuristic. A keyword/URL model with no measured accuracy; it is evaded by rewording and will both miss scams and false-alarm. Treat its verdict as a weak hint until a model is trained on real labeled data.
  • Overlay (photo) detection is weak. It flags multiple QR codes in a frame, but misses the common case where a sticker fully covers the original (one QR).
  • URL accuracy is unproven at scale. The benchmark passes on a tiny bundled sample; no real-world precision/recall figure exists yet (see bench_*).
  • Threat feeds are opt-in and rate-limited. Without NETH_URL_ONLINE=1 and API keys, URL scoring is heuristic-only.
  • Not a substitute for vigilance. A clever, well-localized scam with a valid QR and clean link can pass every check.

Roadmap

  • Train the XLM-RoBERTa Khmer phishing classifier on a labeled local dataset
  • Train a YOLOv8 sticker-boundary model to augment vision_overlay.detect()
  • Known-bad URL/domain feed for nlp_khmer (URLhaus + Google Safe Browsing)
  • Per-merchant known-good QR reference store for overlay comparison

Open-source community edition. NETH assists detection; always verify the recipient name before paying.

About

πŸ‘οΈ NETH (αž“αŸαžαŸ’αžš) β€” The Digital Watchful Eye. A Khmer-first gateway that detects KHQR payment scams (QR overlay / routing mismatch) and phishing. Scan a QR photo or paste a message β†’ Safe / Suspicious / Blocked, via Telegram bot or web.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors