Skip to content

oxiverse-ecosystem/intentforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

229 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

IntentForge v2

Privacy-first, intent-driven search engine with zero-trust architecture.

IntentForge is a high-performance discovery platform that combines BM25 keyword matching with ONNX-powered semantic search to deliver intent-aligned results. Built with privacy at its core β€” no tracking, no data harvesting, no corporate dependencies.


Current Capabilities

πŸ” Core Search

  • Hybrid Search: Combines BM25 keyword matching with semantic vector search (ONNX embeddings)
  • Intent Classification: Detects Navigational, Informational, Transactional, and Exploratory queries with adaptive semantic ratios
  • Multi-tier Caching: L1 in-memory cache + Redis for sub-millisecond repeated queries
  • Meta-search Aggregation: Simultaneously queries 9+ providers (Brave, Google, Wikipedia, Arxiv, Reddit, GitHub, Hacker News, Medium, DuckDuckGo)

πŸ–ΌοΈ Image Search

  • Zero-bandwidth indexing: Extracts context from HTML metadata (alt, title, captions, surrounding text) without downloading images
  • Perceptual hashing: dHash (64-bit) for deduplication
  • Visual fingerprints: ThumbHash for compact (~20-30 byte) thumbnail representation
  • Quality gates: Filters tracking pixels, SVGs, and 1x1 placeholders

πŸ“Ή Video Search

  • Multi-source discovery: YouTube Unified, Piped, Invidious, Dailymotion, Vimeo, Internet Archive
  • Intent-first scoring: Ranks by intent match, relevance, and quality signals
  • Privacy-friendly frontends: Prefers Piped/Invidious over direct YouTube API

πŸ›‘οΈ Privacy & Security

  • Anti-detection: TLS fingerprinting, randomized timing, viewport spoofing to bypass AI-generated content detection
  • Tor integration: Optional route through Tor for anonymity
  • No tracking: No cookies, no user profiling, no data retention
  • Cloudflare bypass: Automated CAPTCHA solving for accessibility

πŸ“š Content Extraction

  • Trafilatura integration: Boilerplate removal, readable text extraction
  • Multi-format support: Articles, documentation, technical content
  • RSS firehose: 80+ curated sources for continuous content discovery

πŸš€ Performance

  • Cross-encoder reranking: ms-marco-MiniLM-L6-v2 for precision reordering
  • Adaptive semantic ratio: Query-type-aware blend of keyword vs semantic search
  • Self-improvement: Automatic gap analysis triggers background crawling for weak results
  • Common Crawl integration: Massive URL discovery from web archives

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        IntentForge Core (Rust)                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Intent      β”‚   Ranking    β”‚  Discovery    β”‚  Anti-Detection   β”‚
β”‚ Classifier  β”‚   Engine     β”‚  (Firehose)   β”‚  + Cloudflare     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    Meilisearch Index (Hybrid)                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                 Query Layer (Python + FastAPI)                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  ONNX Embed  β”‚  β”‚ Cross-Encoderβ”‚  β”‚  Multi-tier Cache     β”‚ β”‚
β”‚  β”‚  (all-MiniLM)β”‚  β”‚  Reranker    β”‚  β”‚  L1 + Redis          β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      Meta-Search API (Rust)                     β”‚
β”‚  Brave Β· Google Β· Wikipedia Β· Arxiv Β· Reddit Β· GitHub Β· HN     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                    Video Services (Node.js)                     β”‚
β”‚           YouTube Unified Β· Piped Β· Invidious                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

API Endpoints

Endpoint Method Description
/search GET Hybrid intent-first search
/news GET News aggregation with geo-targeting
/images GET Semantic image discovery
/videos GET Intent-weighted video search
/meta GET Meta-search across all providers
/crawl GET Single URL content extraction
/crawl/batch POST Batch URL extraction
/health GET Service health check
/metrics GET Prometheus metrics

Stack

Component Technology
Core Engine Rust (Edition 2021)
Semantic Search ONNX Runtime (all-MiniLM-L6-v2)
Cross-Encoder ms-marco-MiniLM-L6-v2
Search Index Meilisearch
Cache Redis (Tier-2) + In-memory LRU (Tier-1)
Query Layer Python + FastAPI
Meta-Search Rust (9+ providers)
Video Discovery Node.js (YouTube Unified)
Content Extraction Trafilatura (Python)
Orchestration Docker Compose

Privacy Principles

  1. Zero data retention β€” No search logs, no analytics, no cookies
  2. No corporate dependencies β€” All sources are open or privacy-respecting
  3. Encrypted transport β€” HTTPS everywhere, Tor optional
  4. No AI content detection fingerprinting β€” Built-in bypass
  5. Open source β€” Full transparency on all components

Getting Started

# Full release build
cargo build --release

# Run API server (port 9100)
cargo run --release

# Docker full stack
docker-compose -f docker-compose.dev.yml up -d --build

Project Status

IntentForge v2 is operational and actively developed. The core search, image indexing, and video discovery systems are functional. Ongoing work focuses on latency optimization, expanded source coverage, and enhanced personalization.

See docs/ROADMAP.md for planned improvements.

About

IntentForge v2: An intent-first, privacy-preserving discovery engine. Rust-powered meta-search with adaptive ranking, zero tracking, and sub-3s latency. Part of Oxiverse.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors