A Python Streamlit application that predicts Key Performance Indicators (KPIs) from past email marketing campaign results. This tool helps marketers optimize their email campaigns by predicting open rates, click rates, and opt-out rates based on historical data.
- KPI Prediction: Predict open rates, click rates, and opt-out rates for new email campaigns
- AI-Powered Subject Line Optimization: Generate and test alternative subject lines using Groq LLM API
- A/B/C/D Testing: Compare your subject line against AI-generated alternatives
- Age Group Analysis: Visualize how different age groups respond to your campaigns
- Model Training: Train, evaluate, and manage different model versions
- Feature Importance Analysis: Understand what factors influence your email performance
- Interactive Visualizations: Explore data through heatmaps and charts
The application expects two main data files:
CSV file with semicolon (;) separator containing:
InternalName: Delivery identifierSubject: Email subject lineDate: Date and time of deliverySendouts: Total number of emails sentOpens: Total number of opensClicks: Total number of clicksOptouts: Total number of unsubscribesDialog,Syfte,Product: Campaign metadataPreheader: Email preheader (for v2.0.0+ models)
Example:
InternalName;Subject;Date;Sendouts;Opens;Clicks;Optouts;Dialog;Syfte;Product
DM123456;Take the car to your next adventure;2024/06/10 15:59;14827;2559;211;9;F;VD;Mo
CSV file with semicolon (;) separator containing:
Primary key: Customer identifierInternalName: Delivery identifier to link with delivery dataOptOut: If customer opted out (1/0)Open: If customer opened the email (1/0)Click: If customer clicked in the email (1/0)Gender: Customer genderAge: Customer ageBolag: Customer company/region connection
Example:
Primary key;OptOut;Open;Click;Gender;Age;InternalName;Bolag
12345678;0;1;0;Kvinna;69;DM123456;Stockholm
The application uses semantic versioning (Major.Minor.Patch) for models:
- v1.x.x: Basic models with subject line features only
- v2.x.x: Enhanced models with both subject line and preheader features
Each model version has its own documentation and performance metrics saved in the Docs/model_vX.X.X/ directory.
- Feature Engineering: Extracts features from subject lines, preheaders, and campaign metadata
- XGBoost Model: Machine learning model to predict email performance
- Groq API Integration: Generates optimized subject line alternatives
- Age Group Analysis: Segments performance by customer age groups
- Model Versioning: Manages multiple model versions with performance documentation
The application supports various configuration options:
- Data Sources: Adjust file paths in the
load_data()function - Model Parameters: Configure hyperparameters when training new models
- Sample Weights: Adjust how the model weights high-performing campaigns
- Age Grouping: Modify age group definitions in the
categorize_age()function
PredictKPI/
├── Data/
│ ├── customer_data.csv # Customer-level data
│ ├── delivery_data.csv # Delivery-level data
│ └── example_*.csv # Example data files
├── app/
│ ├── app.py # Main Streamlit application
│ ├── requirements.txt # Python dependencies
│ ├── models/ # Saved model files
│ └── Docs/ # Model documentation
├── Documentation.md # Detailed documentation
├── LICENSE # MIT License
├── README.md # This file
└── example.env # Example environment variables
Train new model versions with customized parameters:
- Navigate to the "Model Results" tab
- Expand "Retrain Model with Custom Parameters"
- Adjust model parameters and sample weight configuration
- Click "Retrain Model"
Analyze how different age groups interact with your campaigns:
- Navigate to the "Model Results" tab
- Expand "Age Group Analysis"
- Select which views to display (Overall, Dialog, Syfte, Product)
- Compare open rates, click rates, and opt-out rates across age groups
Test your subject line against AI-generated alternatives:
- Navigate to the "Sendout Prediction" tab
- Enter your subject line (and preheader for v2+ models)
- Check the "GenAI" box
- Click "Send to Groq API"
- Compare the predicted performance of all versions