Skip to content

runzhao3/outlook-application-summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Outlook Application Summary

Automated tool that reads job application emails from Outlook, summarizes them using AI, and syncs the information to a Notion database with intelligent deduplication and incremental sync.

Features

  • πŸ“§ Email Processing: Fetches emails from Outlook using Microsoft Graph API
  • πŸ€– AI Summarization: Extracts company, role, location, and application stage using OpenAI GPT-4o-mini
  • πŸ”„ Smart Deduplication: Aggregates multiple emails for the same application into a single lifecycle entry
  • πŸ“Š Notion Integration: Syncs to Notion databases with automatic company and application tracking
  • πŸ’Ύ Incremental Sync: 4-stage pipeline with intelligent caching - only processes new emails
  • πŸ§ͺ Fully Tested: Comprehensive pytest test suite for all modules

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment Variables

Create a .env file in the project root:

# Microsoft Azure App Registration
TENANT_ID=your-tenant-id
CLIENT_ID=your-client-id

# OpenAI API
OPENAI_API_KEY=your-openai-api-key

# Notion API
NOTION_API_KEY=your-notion-integration-token
NOTION_COMPANIES_DB_ID=your-companies-database-id
NOTION_APPLICATIONS_DB_ID=your-applications-database-id

# Optional: User email for contact filtering
USER_EMAIL=your-email@example.com

Note: Using Public Client Flow (device code flow) does NOT require CLIENT_SECRET.

3. Set Up Notion Databases

Create two databases in Notion:

Companies Database:

  • Name (Title) - Required
  • Location (Text) - Optional
  • Industry (Text) - Optional
  • Website (URL) - Optional
  • Notes (Text) - Optional

Job Applications Database:

  • Job Role (Title) - Required
  • Company (Relation β†’ Companies) - Required
  • Location (Text) - Optional
  • Contact (Text) - Optional
  • Status (Status) - Required (options: Applied, Interview, Rejected, Offer, Withdrawn)
  • Application Date (Date) - Optional
  • Last Communication Date (Date) - Optional
  • Last Communication Type (Select) - Optional
  • Last Communication Notes (Text) - Optional
  • Email IDs (Text) - Required (comma-separated list of synced email IDs)
  • Notes (Text) - Optional

Important: Both databases must be connected to your Notion Integration.

4. Run the Application

# Process emails (default: 100 emails)
python process_emails.py

# Process specific number of emails
python process_emails.py --limit 50

# Process from specific folder
python process_emails.py --folder "Applications" --limit 20

# Dry run (test without writing to Notion)
python process_emails.py --limit 5 --dry-run

How It Works

The system uses a 4-stage pipeline with intelligent caching:

  1. Stage 1 β†’ Stage 2: Fetch emails from Outlook and cache raw email data
  2. Stage 2 β†’ Stage 3: Summarize emails using AI and cache summaries
  3. Stage 3 β†’ Stage 4: Deduplicate emails by Company + Role and sync to Notion
  4. Stage 4: Track synced emails using Email IDs stored in Notion

Each stage only processes items from the previous stage that haven't been processed yet, using cache files to track progress. This enables efficient incremental sync even after months of inactivity.

Project Structure

Outlook Application Summary New/
β”œβ”€β”€ src/                    # Core modules (pure, no CLI)
β”‚   β”œβ”€β”€ email_summary.py        # EmailSummary dataclass
β”‚   β”œβ”€β”€ read_outlook_emails.py  # Outlook API integration
β”‚   β”œβ”€β”€ summarize_emails.py     # AI summarization
β”‚   β”œβ”€β”€ deduplicate.py          # Email deduplication logic
β”‚   β”œβ”€β”€ notion_writer.py        # Notion API integration
β”‚   β”œβ”€β”€ cache.py                # File-based caching
β”‚   └── test/                   # Pytest test suite
β”œβ”€β”€ cache/                  # Cache storage (gitignored)
β”‚   β”œβ”€β”€ email/              # Cached raw emails (email_<id>.json)
β”‚   └── summary/            # Cached summaries (summary_<id>.json)
β”œβ”€β”€ process_emails.py       # Main orchestrator script
β”œβ”€β”€ requirements.txt        # Python dependencies
└── README.md              # This file

Module Organization

  • src/: Pure modules with no CLI, file I/O, or environment variable loading
  • process_emails.py: Orchestrator script that ties everything together
  • src/test/: Pytest tests for all modules

Core Modules

src/email_summary.py

Data structure representing a summarized email with company, role, location, application stage, and contact information.

src/read_outlook_emails.py

Handles authentication with Microsoft Graph API and fetches emails from Outlook folders using device code flow.

src/summarize_emails.py

Uses OpenAI GPT-4o-mini to extract structured information from emails. Cleans HTML content and extracts company, role, location, and application stage.

src/deduplicate.py

Aggregates multiple emails for the same job application (Company + Role) into a single ApplicationEntry with lifecycle tracking.

src/notion_writer.py

Manages Notion database operations: creating/updating companies and job applications, fetching existing entries, and tracking synced email IDs.

src/cache.py

File-based caching system for emails and summaries, enabling incremental processing.

Development

Running Tests

# Run all tests
pytest src/test/ -v

# Run specific test file
pytest src/test/test_email_summary.py -v

# Run with coverage
pytest src/test/ --cov=src --cov-report=html

Module Principles

All modules in src/ follow these principles:

  • βœ… No CLI code
  • βœ… No file I/O helpers
  • βœ… No environment variable loading
  • βœ… Single responsibility
  • βœ… Pure functions where possible
  • βœ… Explicit dependencies
  • βœ… Full type hints
  • βœ… Comprehensive tests

Configuration

Azure App Registration Setup

  1. Go to Azure Portal β†’ Azure Active Directory β†’ App registrations
  2. Create a new registration
  3. Note your Application (client) ID and Directory (tenant) ID
  4. Go to API permissions β†’ Add permission β†’ Microsoft Graph β†’ Delegated permissions
  5. Add Mail.Read permission
  6. Grant admin consent
  7. Go to Authentication β†’ Enable "Allow public client flows" β†’ Save

OpenAI Setup

  1. Get API key from OpenAI Platform
  2. Add to .env file as OPENAI_API_KEY

Notion Setup

  1. Create Integration at Notion Integrations
  2. Get your integration token
  3. Create Companies and Job Applications databases
  4. Connect both databases to your integration
  5. Extract database IDs from URLs (32 characters, remove hyphens)

Troubleshooting

Authentication Issues

AADSTS7000218 Error:

  • Go to Azure Portal β†’ App registrations β†’ Your App β†’ Authentication
  • Enable "Allow public client flows" β†’ Save
  • Re-run the script

Permission Denied:

  • Ensure Mail.Read permission is granted with admin consent
  • Verify you've clicked "Grant admin consent" in Azure Portal

Notion Sync Issues

  • Verify database IDs are correct (32 characters, remove hyphens from URL)
  • Ensure both databases are connected to your integration
  • Check that property names match exactly (case-sensitive, e.g., "Status" not "status")
  • Ensure "Email IDs" property exists in Job Applications database (Text type)

Cache Issues

  • Cache is stored in cache/ directory (automatically created)
  • Delete cache/ directory to force reprocessing all emails
  • Cache is purely file-based - existence of files determines cache status
  • Each email gets two cache files: email_<id>.json and summary_<id>.json

Data Flow

Outlook Email β†’ Cache (Stage 2)
             β†’ Summarize (Stage 3)
             β†’ Deduplicate by Company+Role
             β†’ Sync to Notion (Stage 4)
             β†’ Track Email IDs in Notion

License

This project is provided as-is for personal use.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages