Skip to content

smixs/osint-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSINT Skill

Early Beta License: MIT Apify Actors

Early Beta — functional but APIs change, Apify actors rotate, and edge cases are still being discovered.

Версия на русском / Russian version

Systematic intelligence gathering on individuals. From a name or handle to a scored dossier with psychoprofile, career map, and entry points.

Compatible Agents

Works with any AI coding agent that supports the SKILL.md format:

Agent Install
Claude Code cp -r osint/ ~/.claude/skills/osint/
OpenClaw Copy to workspace skills/ directory
Codex Point to osint/SKILL.md in your agent config
OpenCode Copy to project skills/ directory
Any SKILL.md agent Place osint/ folder where your agent reads skills

The skill uses standard tools (bash, curl, node, python3) and writes instructions in Markdown — no vendor lock-in.

Features

  • Phased pipeline (0 → 1 → 1.5 → 2 → 3 → 4 → 5 → 6): from quick search to deep research
  • Swarm Mode: coordinates 3-5 parallel sub-agents on Sonnet for speed
  • 55+ Apify actors embedded: Instagram (12), Facebook (14), TikTok (14), YouTube (5), Google Maps (4), LinkedIn, and more
  • Psychoprofile: MBTI / Big Five based on content analysis (YouTube transcripts, Telegram messages, blogs)
  • Confidence Scoring: every fact gets graded A/B/C/D by number of independent confirmations
  • Internal Intelligence: checks Telegram history, email, vault contacts BEFORE going external
  • Research Escalation: 4 levels from free to $0.50, from seconds to minutes
  • Budget tracking: spends ≤$0.50 without asking, asks permission above that

Quick Start

# Clone the repo
git clone https://github.com/smixs/osint-skill.git
cd osint-skill

# Copy to your agent's skills directory (example for Claude Code)
cp -r osint/ ~/.claude/skills/osint/

# Run self-diagnostics to check what's available
bash osint/scripts/diagnose.sh

Requirements

Required

Tool Purpose Install
curl HTTP requests to APIs Pre-installed on macOS/Linux
python3 JSON parsing, MCP client Pre-installed on macOS/Linux
jq JSON processing brew install jq / apt install jq

For Apify actors (55+ platforms)

Tool Purpose Install
Node.js 18+ Runs run_actor.js (embedded Apify runner) nodejs.org

Optional

Tool Purpose Install
mcpc Dynamic actor discovery in Apify Store npm install -g @apify/mcpc

API Keys & Services

The skill uses graceful degradation — the more API keys you provide, the deeper it can dig. You need at least one search API to get started.

Free Tier

Service Env Variable What It Does Get It
Brave Search (built into Claude Code) 2,000 queries/month, basic web search Built-in, nothing needed
Jina AI JINA_API_KEY URL → markdown reader, search, deepsearch jina.ai/api-key
Apify APIFY_API_TOKEN Instagram, TikTok, YouTube, LinkedIn scraping. Free tier ~$5/month console.apify.com
Parallel AI PARALLEL_API_KEY AI-powered search with reasoning and citations platform.parallel.ai

Paid (recommended)

Service Env Variable What It Does Cost Get It
Perplexity API PERPLEXITY_API_KEY Sonar (fast AI answers), Deep Research ~$5/month perplexity.ai/settings/api
Exa AI EXA_API_KEY Semantic search, people/company research ~$5/month dashboard.exa.ai
Tavily TAVILY_API_KEY Agent-optimized search, $0.005/request basic ~$5/month app.tavily.com

Advanced

Service Env Variable What It Does Cost Get It
Bright Data BRIGHTDATA_MCP_URL CAPTCHA bypass, authwall bypass, Facebook scraping, Yandex search ~$10/month+ brightdata.com/mcp

Setting Up Keys

Option 1 — Environment variables (recommended):

export PERPLEXITY_API_KEY="pplx-..."
export EXA_API_KEY="exa-..."
export APIFY_API_TOKEN="apify_api_..."
export JINA_API_KEY="jina_..."
export TAVILY_API_KEY="tvly-..."
export PARALLEL_API_KEY="..."
export BRIGHTDATA_MCP_URL="https://mcp.brightdata.com/..."

Option 2 — File fallback (supported by some scripts):

<workspace>/scripts/apify-api-token.txt
<workspace>/scripts/jina-api-key.txt
<workspace>/scripts/parallel-api-key.txt
<workspace>/scripts/brightdata-mcp-url.txt

How It Works

Research Phases

Phase 0: Tooling Self-Check      → diagnose.sh, check available tools
Phase 1: Seed Collection         → parallel search across all engines
Phase 1.5: Internal Intelligence → Telegram, email, vault (BEFORE external sources)
Phase 2: Platform Extraction     → LinkedIn, Instagram, Facebook, TikTok, YouTube...
Phase 3: Cross-Reference         → facts verified, graded A/B/C/D
Phase 4: Psychoprofile           → MBTI, Big Five, communication style
Phase 5: Completeness Check      → 9 mandatory checks + Depth Score 1-10
Phase 6: Dossier Output          → formatted dossier from template

Research Escalation (cheap → expensive)

Level 1: Quick Answers      → Perplexity Sonar, Brave, Tavily, Exa    (~$0.00)
Level 2: Source Verification → Jina read, Parallel extract             (~$0.01)
Level 3: Social Media        → Apify scrapers, Bright Data             (~$0.01-0.10)
Level 4: Deep Research       → Perplexity Deep, Exa Deep, Jina Deep   (~$0.05-0.50)

Embedded Scripts

Script Purpose
diagnose.sh Self-diagnostics for all tools and APIs
perplexity.sh search / sonar / deep research
tavily.sh search / deep / extract
exa.sh search / company / people / crawl / deep
first-volley.sh Parallel search across all engines at once
merge-volley.sh Deduplicate and group search results
apify.sh LinkedIn / Instagram / any actor / store search
run-actor.sh Universal Apify runner (55+ actors, polling, CSV/JSON export)
run_actor.js Node.js engine powering run-actor.sh
jina.sh read URL / search / deepsearch
parallel.sh search / extract
brightdata.sh scrape / search / search-geo / search-yandex
mcp-client.py Lightweight MCP client for Bright Data (stdlib only)

Project Structure

osint/
├── SKILL.md                          # Main skill file (452 lines)
├── references/
│   ├── tools.md                      # Full catalog of 55+ Apify actors + all tools
│   ├── platforms.md                  # Platform-specific extraction guide
│   ├── content-extraction.md         # YouTube/podcast/blog extraction
│   └── psychoprofile.md              # MBTI/Big Five methodology
├── assets/
│   └── dossier-template.md           # Output dossier template
└── scripts/
    ├── diagnose.sh                   # Self-check
    ├── run-actor.sh                  # Universal Apify runner (bash wrapper)
    ├── run_actor.js                  # Apify runner engine (Node.js, embedded)
    ├── package.json                  # ESM support for run_actor.js
    ├── apify.sh                      # Apify shortcuts
    ├── perplexity.sh                 # Perplexity API
    ├── tavily.sh                     # Tavily API
    ├── exa.sh                        # Exa AI API
    ├── jina.sh                       # Jina AI API
    ├── parallel.sh                   # Parallel AI API
    ├── brightdata.sh                 # Bright Data MCP
    ├── mcp-client.py                 # MCP client (Python, stdlib only)
    ├── first-volley.sh               # Parallel first search
    └── merge-volley.sh               # Result merging

Known Issues (Beta)

  • Shell injection: user input is interpolated into JSON without jq escaping. Do not run with untrusted input.
  • macOS: first-volley.sh uses tail --pid (Linux-only). Parallel searches work on macOS, but timeout logic may not trigger.
  • Apify actors: actor IDs can change or get removed without notice. Use apify.sh store-search to find alternatives.
  • Inconsistent key loading: Perplexity, Tavily, and Exa only load from env vars (no file fallback, unlike Apify/Jina/Parallel).

Credits

License

MIT

About

OSINT Skill for AI agents (Claude Code, OpenClaw, Codex, OpenCode) — from a name to a scored dossier with psychoprofile, career map, and confidence grades. 55+ Apify actors, 7 search APIs, swarm mode. Early Beta.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors