A Python-based CV/Resume automation pipeline using LangChain (LCEL) that automatically processes incoming CVs from Gmail, analyzes them against job descriptions using AI, and triggers notifications for top candidates.
- Gmail Integration: Supports both OAuth and IMAP authentication for Gmail
- PDF Text Extraction: Extracts text from CVs using PyPDF2 and pdfplumber (with fallback)
- AI-Powered Analysis: Uses LangChain with OpenRouter for structured output extraction
- OpenRouter Integration: Access multiple LLM providers through a single API
- Smart Notifications: Triggers ATS API submissions and email alerts for high-scoring candidates
- Console Dashboard: Colored terminal output showing candidate analysis results
- Persistence: Stores all processed candidates in SQLite database
- Deduplication: Tracks processed emails to prevent duplicate processing
- Daemon Mode: Can run continuously with configurable polling intervals
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Gmail │────▶│ PDF Extract │────▶│ LangChain │
│ (OAuth/IMAP)│ │ (pdfplumber)│ │ CV Analysis │
└─────────────┘ └──────────────┘ └────────┬────────┘
│
┌───────▼───────┐
│ Score >= 8? │
└───────┬───────┘
┌─────────────┼─────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ ATS API │ │ Email │ │ SQLite │
│ Submit │ │ Notify │ │ Save │
└───────────┘ └───────────┘ └───────────┘
│
┌─────▼─────┐
│ Console │
│ Dashboard │
└───────────┘
- Python 3.10 or higher
- OpenRouter API key (get one at https://openrouter.ai/)
- Gmail account (OAuth or App Password for IMAP)
-
Clone and install dependencies:
cd CV_Filter pip install -r requirements.txt -
Configure environment variables:
cp .env.example .env # Edit .env with your API keys and settings -
Choose Gmail authentication method:
Option A: OAuth (Recommended for full features)
- Go to Google Cloud Console
- Create a new project or select existing
- Enable the Gmail API
- Create OAuth 2.0 credentials (Desktop application)
- Download
credentials.jsonand place it in the project directory - Set
GMAIL_AUTH_METHOD=oauthin.env
Option B: IMAP (Simpler setup)
- Enable IMAP in Gmail settings
- Create an App Password at https://myaccount.google.com/apppasswords
- Set in
.env:GMAIL_AUTH_METHOD=imap GMAIL_USER=your.email@gmail.com GMAIL_APP_PASSWORD=your_app_password
-
First run:
python main.py
If using OAuth, this will open a browser for authentication on first run.
| Variable | Description | Default |
|---|---|---|
OPENROUTER_API_KEY |
OpenRouter API key (required) | - |
OPENROUTER_MODEL |
Model to use | openai/gpt-4o |
OPENROUTER_SITE_URL |
Your site URL for OpenRouter | http://localhost |
OPENROUTER_APP_NAME |
App name for OpenRouter | CV Automation Tool |
GMAIL_AUTH_METHOD |
oauth or imap |
oauth |
GMAIL_USER |
Gmail address (for IMAP) | - |
GMAIL_APP_PASSWORD |
Gmail App Password (for IMAP) | - |
SMTP_HOST |
SMTP server for email notifications | smtp.gmail.com |
SMTP_PORT |
SMTP port | 587 |
HR_EMAIL |
HR team email for notifications | - |
SUITABILITY_THRESHOLD |
Minimum score for notifications | 8 |
ATS_API_URL |
ATS API endpoint | placeholder |
DATABASE_PATH |
SQLite database path | candidates.db |
POLL_INTERVAL_SECONDS |
Daemon polling interval | 60 |
Process all new emails once:
python main.pyRun continuously, polling every 60 seconds:
python main.py --daemonWith custom polling interval (5 minutes):
python main.py --daemon --poll-interval 300Load job description from file:
python main.py --job-file job_description.txtSpecify a different OpenRouter model:
python main.py --model anthropic/claude-3-5-sonnetLower the notification threshold:
python main.py --threshold 7See processing statistics:
python main.py --statspython main.py --help
options:
-h, --help show this help message and exit
--daemon, -d Run continuously in daemon mode
--poll-interval, -i Polling interval in seconds (default: 60)
--job-file, -j Path to job description file
--model, -m OpenRouter model name
--threshold, -t Suitability threshold for notifications (default: 8)
--max-emails, -n Maximum emails to process per run (default: 50)
--log-level, -l Logging level (default: INFO)
--stats, -s Show database statistics and exitThe LangChain chain extracts the following information from each CV:
class CandidateAnalysis:
name: str # Full name of the candidate
email: str # Email address
years_of_experience: float # Total years of experience
top_skills: list[str] # Top 5-10 relevant skills
suitability_score: int # Score from 1-10
fit_reason: str # Why this score was given
summary: str # 1-sentence summary- 1-3: Poor match, lacks most required qualifications
- 4-5: Below average match, missing key requirements
- 6-7: Average match, meets some requirements
- 8-9: Strong match, meets most requirements well
- 10: Exceptional match, exceeds requirements
CV_Filter/
├── main.py # Main orchestration script
├── config.py # Configuration and settings
├── models.py # Pydantic models for structured output
├── gmail_service.py # Gmail API integration (OAuth + IMAP)
├── pdf_extractor.py # PDF text extraction
├── cv_chain.py # LangChain CV analysis chain (OpenRouter)
├── notifications.py # ATS API, Email, and Console notifications
├── database.py # SQLite persistence layer
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── README.md # This file
The tool uses OpenRouter as the LLM provider, allowing access to multiple models:
openai/gpt-4o(default)anthropic/claude-3-5-sonnetgoogle/gemini-pro-1.5meta-llama/llama-3.1-70b-instruct
# In .env
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_MODEL=anthropic/claude-3-5-sonnet
OPENROUTER_SITE_URL=https://mycompany.com
OPENROUTER_APP_NAME=CV Automation Toolfrom cv_chain import CVAnalyzer
# Use default model from settings
analyzer = CVAnalyzer()
# Use specific model
analyzer = CVAnalyzer(model="openai/gpt-4o")The tool sends a POST request to the configured ATS API endpoint with this payload:
{
"candidate_name": "John Doe",
"candidate_email": "john@example.com",
"years_of_experience": 5.0,
"skills": ["Python", "JavaScript", "AWS"],
"suitability_score": 8,
"summary": "Experienced software engineer with strong Python skills...",
"source": "cv_automation_tool",
"cv_filename": "john_doe_resume.pdf",
"submitted_at": "2024-01-15T10:30:00Z"
}The full JSON response from the ATS API is logged to the terminal for debugging.
For candidates scoring >= threshold, an HTML email is sent to HR_EMAIL with:
- Candidate name, email, and experience
- Suitability score with color coding
- Top skills badges
- Fit reason explaining the score
- Summary of qualifications
- CV filename and processing timestamp
Configure SMTP settings in .env:
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your.email@gmail.com
SMTP_PASSWORD=your_app_password
HR_EMAIL=hr@yourcompany.com
All candidates are displayed in the terminal with colored output:
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
🎯 TOP CANDIDATE ALERT! 🎯
Candidate: John Doe
Email: john@example.com
Experience: 5 years
Skills: Python, JavaScript, AWS, Docker, PostgreSQL
Score: 9/10
Fit Reason:
Strong match with 5 years of Python experience and cloud expertise...
Summary:
Experienced full-stack developer with 5 years...
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
Logs are written to both console and file (configured in .env):
10:30:15 | INFO | main | Starting CV Automation Pipeline run
10:30:16 | INFO | gmail_service | Found 3 emails with PDF attachments
10:30:17 | INFO | pdf_extractor | pdfplumber extracted 2450 characters from 2 pages
10:30:20 | INFO | cv_chain | Analysis complete: John Doe, Score: 9/10
10:30:21 | INFO | notifications | Successfully submitted John Doe to ATS
10:30:22 | INFO | notifications | Email notification sent successfully to hr@company.com
| Column | Type | Description |
|---|---|---|
| id | INTEGER | Primary key |
| message_id | TEXT | Gmail message ID (unique) |
| sender_email | TEXT | Email sender |
| attachment_filename | TEXT | PDF filename |
| name | TEXT | Candidate name |
| TEXT | Candidate email | |
| years_of_experience | REAL | Years of experience |
| top_skills | TEXT | JSON array of skills |
| suitability_score | INTEGER | Score 1-10 |
| summary | TEXT | One-sentence summary |
| processed_at | TEXT | ISO timestamp |
| ats_submitted | INTEGER | Boolean flag |
| email_notified | INTEGER | Boolean flag |
| Column | Type | Description |
|---|---|---|
| message_id | TEXT | Gmail message ID (unique) |
| sender_email | TEXT | Sender address |
| subject | TEXT | Email subject |
| processed_at | TEXT | ISO timestamp |
| status | TEXT | Processing status |
- PDF extraction failures: Falls back from pdfplumber to PyPDF2
- LLM failures: Logged and email marked with error status
- ATS API failures: Logged but processing continues
- Email failures: Logged but processing continues
- Gmail API errors: Retried with exponential backoff
MIT License - see LICENSE file for details.