Privacy-first, intent-driven search engine with zero-trust architecture.
IntentForge is a high-performance discovery platform that combines BM25 keyword matching with ONNX-powered semantic search to deliver intent-aligned results. Built with privacy at its core β no tracking, no data harvesting, no corporate dependencies.
- Hybrid Search: Combines BM25 keyword matching with semantic vector search (ONNX embeddings)
- Intent Classification: Detects Navigational, Informational, Transactional, and Exploratory queries with adaptive semantic ratios
- Multi-tier Caching: L1 in-memory cache + Redis for sub-millisecond repeated queries
- Meta-search Aggregation: Simultaneously queries 9+ providers (Brave, Google, Wikipedia, Arxiv, Reddit, GitHub, Hacker News, Medium, DuckDuckGo)
- Zero-bandwidth indexing: Extracts context from HTML metadata (alt, title, captions, surrounding text) without downloading images
- Perceptual hashing: dHash (64-bit) for deduplication
- Visual fingerprints: ThumbHash for compact (~20-30 byte) thumbnail representation
- Quality gates: Filters tracking pixels, SVGs, and 1x1 placeholders
- Multi-source discovery: YouTube Unified, Piped, Invidious, Dailymotion, Vimeo, Internet Archive
- Intent-first scoring: Ranks by intent match, relevance, and quality signals
- Privacy-friendly frontends: Prefers Piped/Invidious over direct YouTube API
- Anti-detection: TLS fingerprinting, randomized timing, viewport spoofing to bypass AI-generated content detection
- Tor integration: Optional route through Tor for anonymity
- No tracking: No cookies, no user profiling, no data retention
- Cloudflare bypass: Automated CAPTCHA solving for accessibility
- Trafilatura integration: Boilerplate removal, readable text extraction
- Multi-format support: Articles, documentation, technical content
- RSS firehose: 80+ curated sources for continuous content discovery
- Cross-encoder reranking: ms-marco-MiniLM-L6-v2 for precision reordering
- Adaptive semantic ratio: Query-type-aware blend of keyword vs semantic search
- Self-improvement: Automatic gap analysis triggers background crawling for weak results
- Common Crawl integration: Massive URL discovery from web archives
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β IntentForge Core (Rust) β
ββββββββββββββββ¬βββββββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββββ€
β Intent β Ranking β Discovery β Anti-Detection β
β Classifier β Engine β (Firehose) β + Cloudflare β
ββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββββ€
β Meilisearch Index (Hybrid) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Query Layer (Python + FastAPI) β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
β β ONNX Embed β β Cross-Encoderβ β Multi-tier Cache β β
β β (all-MiniLM)β β Reranker β β L1 + Redis β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Meta-Search API (Rust) β
β Brave Β· Google Β· Wikipedia Β· Arxiv Β· Reddit Β· GitHub Β· HN β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Video Services (Node.js) β
β YouTube Unified Β· Piped Β· Invidious β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Endpoint | Method | Description |
|---|---|---|
/search |
GET | Hybrid intent-first search |
/news |
GET | News aggregation with geo-targeting |
/images |
GET | Semantic image discovery |
/videos |
GET | Intent-weighted video search |
/meta |
GET | Meta-search across all providers |
/crawl |
GET | Single URL content extraction |
/crawl/batch |
POST | Batch URL extraction |
/health |
GET | Service health check |
/metrics |
GET | Prometheus metrics |
| Component | Technology |
|---|---|
| Core Engine | Rust (Edition 2021) |
| Semantic Search | ONNX Runtime (all-MiniLM-L6-v2) |
| Cross-Encoder | ms-marco-MiniLM-L6-v2 |
| Search Index | Meilisearch |
| Cache | Redis (Tier-2) + In-memory LRU (Tier-1) |
| Query Layer | Python + FastAPI |
| Meta-Search | Rust (9+ providers) |
| Video Discovery | Node.js (YouTube Unified) |
| Content Extraction | Trafilatura (Python) |
| Orchestration | Docker Compose |
- Zero data retention β No search logs, no analytics, no cookies
- No corporate dependencies β All sources are open or privacy-respecting
- Encrypted transport β HTTPS everywhere, Tor optional
- No AI content detection fingerprinting β Built-in bypass
- Open source β Full transparency on all components
# Full release build
cargo build --release
# Run API server (port 9100)
cargo run --release
# Docker full stack
docker-compose -f docker-compose.dev.yml up -d --buildIntentForge v2 is operational and actively developed. The core search, image indexing, and video discovery systems are functional. Ongoing work focuses on latency optimization, expanded source coverage, and enhanced personalization.
See docs/ROADMAP.md for planned improvements.