Skip to content

Safan05/cv-filtering

Repository files navigation

CV Automation Tool

A Python-based CV/Resume automation pipeline using LangChain (LCEL) that automatically processes incoming CVs from Gmail, analyzes them against job descriptions using AI, and triggers notifications for top candidates.

Features

  • Gmail Integration: Supports both OAuth and IMAP authentication for Gmail
  • PDF Text Extraction: Extracts text from CVs using PyPDF2 and pdfplumber (with fallback)
  • AI-Powered Analysis: Uses LangChain with OpenRouter for structured output extraction
  • OpenRouter Integration: Access multiple LLM providers through a single API
  • Smart Notifications: Triggers ATS API submissions and email alerts for high-scoring candidates
  • Console Dashboard: Colored terminal output showing candidate analysis results
  • Persistence: Stores all processed candidates in SQLite database
  • Deduplication: Tracks processed emails to prevent duplicate processing
  • Daemon Mode: Can run continuously with configurable polling intervals

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────────┐
│   Gmail     │────▶│  PDF Extract │────▶│   LangChain     │
│ (OAuth/IMAP)│     │  (pdfplumber)│     │   CV Analysis   │
└─────────────┘     └──────────────┘     └────────┬────────┘
                                                  │
                                          ┌───────▼───────┐
                                          │   Score >= 8? │
                                          └───────┬───────┘
                                    ┌─────────────┼─────────────┐
                                    │             │             │
                              ┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
                              │  ATS API  │ │   Email   │ │  SQLite   │
                              │  Submit   │ │  Notify   │ │   Save    │
                              └───────────┘ └───────────┘ └───────────┘
                                    │
                              ┌─────▼─────┐
                              │  Console  │
                              │ Dashboard │
                              └───────────┘

Installation

Prerequisites

  • Python 3.10 or higher
  • OpenRouter API key (get one at https://openrouter.ai/)
  • Gmail account (OAuth or App Password for IMAP)

Setup

  1. Clone and install dependencies:

    cd CV_Filter
    pip install -r requirements.txt
  2. Configure environment variables:

    cp .env.example .env
    # Edit .env with your API keys and settings
  3. Choose Gmail authentication method:

    Option A: OAuth (Recommended for full features)

    • Go to Google Cloud Console
    • Create a new project or select existing
    • Enable the Gmail API
    • Create OAuth 2.0 credentials (Desktop application)
    • Download credentials.json and place it in the project directory
    • Set GMAIL_AUTH_METHOD=oauth in .env

    Option B: IMAP (Simpler setup)

  4. First run:

    python main.py

    If using OAuth, this will open a browser for authentication on first run.

Configuration

Environment Variables

Variable Description Default
OPENROUTER_API_KEY OpenRouter API key (required) -
OPENROUTER_MODEL Model to use openai/gpt-4o
OPENROUTER_SITE_URL Your site URL for OpenRouter http://localhost
OPENROUTER_APP_NAME App name for OpenRouter CV Automation Tool
GMAIL_AUTH_METHOD oauth or imap oauth
GMAIL_USER Gmail address (for IMAP) -
GMAIL_APP_PASSWORD Gmail App Password (for IMAP) -
SMTP_HOST SMTP server for email notifications smtp.gmail.com
SMTP_PORT SMTP port 587
HR_EMAIL HR team email for notifications -
SUITABILITY_THRESHOLD Minimum score for notifications 8
ATS_API_URL ATS API endpoint placeholder
DATABASE_PATH SQLite database path candidates.db
POLL_INTERVAL_SECONDS Daemon polling interval 60

Usage

Basic Usage

Process all new emails once:

python main.py

Daemon Mode

Run continuously, polling every 60 seconds:

python main.py --daemon

With custom polling interval (5 minutes):

python main.py --daemon --poll-interval 300

Custom Job Description

Load job description from file:

python main.py --job-file job_description.txt

Use Different Model

Specify a different OpenRouter model:

python main.py --model anthropic/claude-3-5-sonnet

Adjust Threshold

Lower the notification threshold:

python main.py --threshold 7

View Statistics

See processing statistics:

python main.py --stats

Full Options

python main.py --help

options:
  -h, --help            show this help message and exit
  --daemon, -d          Run continuously in daemon mode
  --poll-interval, -i   Polling interval in seconds (default: 60)
  --job-file, -j        Path to job description file
  --model, -m           OpenRouter model name
  --threshold, -t       Suitability threshold for notifications (default: 8)
  --max-emails, -n      Maximum emails to process per run (default: 50)
  --log-level, -l       Logging level (default: INFO)
  --stats, -s           Show database statistics and exit

Structured Output Schema

The LangChain chain extracts the following information from each CV:

class CandidateAnalysis:
    name: str                    # Full name of the candidate
    email: str                   # Email address
    years_of_experience: float   # Total years of experience
    top_skills: list[str]        # Top 5-10 relevant skills
    suitability_score: int       # Score from 1-10
    fit_reason: str              # Why this score was given
    summary: str                 # 1-sentence summary

Suitability Score Guidelines

  • 1-3: Poor match, lacks most required qualifications
  • 4-5: Below average match, missing key requirements
  • 6-7: Average match, meets some requirements
  • 8-9: Strong match, meets most requirements well
  • 10: Exceptional match, exceeds requirements

Module Structure

CV_Filter/
├── main.py              # Main orchestration script
├── config.py            # Configuration and settings
├── models.py            # Pydantic models for structured output
├── gmail_service.py     # Gmail API integration (OAuth + IMAP)
├── pdf_extractor.py     # PDF text extraction
├── cv_chain.py          # LangChain CV analysis chain (OpenRouter)
├── notifications.py     # ATS API, Email, and Console notifications
├── database.py          # SQLite persistence layer
├── requirements.txt     # Python dependencies
├── .env.example         # Environment variables template
└── README.md            # This file

OpenRouter Integration

The tool uses OpenRouter as the LLM provider, allowing access to multiple models:

Popular Models Available

  • openai/gpt-4o (default)
  • anthropic/claude-3-5-sonnet
  • google/gemini-pro-1.5
  • meta-llama/llama-3.1-70b-instruct

Configuration Example

# In .env
OPENROUTER_API_KEY=sk-or-...
OPENROUTER_MODEL=anthropic/claude-3-5-sonnet
OPENROUTER_SITE_URL=https://mycompany.com
OPENROUTER_APP_NAME=CV Automation Tool

Programmatically

from cv_chain import CVAnalyzer

# Use default model from settings
analyzer = CVAnalyzer()

# Use specific model
analyzer = CVAnalyzer(model="openai/gpt-4o")

ATS Integration

The tool sends a POST request to the configured ATS API endpoint with this payload:

{
  "candidate_name": "John Doe",
  "candidate_email": "john@example.com",
  "years_of_experience": 5.0,
  "skills": ["Python", "JavaScript", "AWS"],
  "suitability_score": 8,
  "summary": "Experienced software engineer with strong Python skills...",
  "source": "cv_automation_tool",
  "cv_filename": "john_doe_resume.pdf",
  "submitted_at": "2024-01-15T10:30:00Z"
}

The full JSON response from the ATS API is logged to the terminal for debugging.

Email Notifications

For candidates scoring >= threshold, an HTML email is sent to HR_EMAIL with:

  • Candidate name, email, and experience
  • Suitability score with color coding
  • Top skills badges
  • Fit reason explaining the score
  • Summary of qualifications
  • CV filename and processing timestamp

Configure SMTP settings in .env:

SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your.email@gmail.com
SMTP_PASSWORD=your_app_password
HR_EMAIL=hr@yourcompany.com

Console Dashboard

All candidates are displayed in the terminal with colored output:

★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
🎯 TOP CANDIDATE ALERT! 🎯
Candidate: John Doe
Email: john@example.com
Experience: 5 years
Skills: Python, JavaScript, AWS, Docker, PostgreSQL
Score: 9/10

Fit Reason:
  Strong match with 5 years of Python experience and cloud expertise...

Summary:
  Experienced full-stack developer with 5 years...
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★

Logging

Logs are written to both console and file (configured in .env):

10:30:15 | INFO     | main | Starting CV Automation Pipeline run
10:30:16 | INFO     | gmail_service | Found 3 emails with PDF attachments
10:30:17 | INFO     | pdf_extractor | pdfplumber extracted 2450 characters from 2 pages
10:30:20 | INFO     | cv_chain | Analysis complete: John Doe, Score: 9/10
10:30:21 | INFO     | notifications | Successfully submitted John Doe to ATS
10:30:22 | INFO     | notifications | Email notification sent successfully to hr@company.com

Database Schema

candidates table

Column Type Description
id INTEGER Primary key
message_id TEXT Gmail message ID (unique)
sender_email TEXT Email sender
attachment_filename TEXT PDF filename
name TEXT Candidate name
email TEXT Candidate email
years_of_experience REAL Years of experience
top_skills TEXT JSON array of skills
suitability_score INTEGER Score 1-10
summary TEXT One-sentence summary
processed_at TEXT ISO timestamp
ats_submitted INTEGER Boolean flag
email_notified INTEGER Boolean flag

processed_emails table

Column Type Description
message_id TEXT Gmail message ID (unique)
sender_email TEXT Sender address
subject TEXT Email subject
processed_at TEXT ISO timestamp
status TEXT Processing status

Error Handling

  • PDF extraction failures: Falls back from pdfplumber to PyPDF2
  • LLM failures: Logged and email marked with error status
  • ATS API failures: Logged but processing continues
  • Email failures: Logged but processing continues
  • Gmail API errors: Retried with exponential backoff

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages