This project is my submission for Future Interns Data Science & Analytics - Task 2: Customer Retention & Churn Analysis.
The goal of this project is to analyze customer and subscription behavior for a SaaS-style business and turn the data into retention insights that a product manager, founder, or business stakeholder could actually use.
I treated this project like a real retention analytics case. Instead of only calculating churn, I looked at churn from different angles: account status, subscription churn, churn-event history, customer lifetime, support experience, feature usage, acquisition source, and customer cohorts.
For subscription-based businesses, growth does not only come from acquiring new customers. Growth also depends on keeping customers active, helping them experience value quickly, and understanding why they leave.
This analysis answers questions such as:
- Why are customers leaving?
- Which customer segments show higher churn risk?
- How long do customers typically stay active?
- Which retention drivers are visible in support, usage, and subscription data?
- What actions can the business take to reduce customer loss?
The dataset used in this project is the RavenStack Synthetic SaaS Dataset.
Dataset credit: River @ Rivalytics
The dataset is fully synthetic and designed for SaaS analytics practice. It includes multi-table customer, subscription, churn, support, and feature-usage data.
| File | Description |
|---|---|
ravenstack_accounts.csv |
Customer account profile, signup date, plan, country, referral source, and churn status |
ravenstack_subscriptions.csv |
Subscription-level start dates, end dates, billing details, plan changes, and churn status |
ravenstack_feature_usage.csv |
Product feature activity, usage volume, duration, errors, and beta-feature usage |
ravenstack_support_tickets.csv |
Support ticket response time, resolution time, priority, satisfaction, and escalation data |
ravenstack_churn_events.csv |
Churn events, churn reasons, refund amounts, upgrades, downgrades, reactivations, and feedback |
- Python
- Pandas
- NumPy
- Matplotlib
- ReportLab
- GitHub
- Customer retention analysis
- Cohort analysis
- Churn segmentation
- KPI reporting
- Business insight generation
The main dashboard is available here:
| Metric | Result |
|---|---|
| Accounts analyzed | 500 |
| Subscriptions analyzed | 5,000 |
| Feature usage records | 25,000 |
| Support tickets | 2,000 |
| Churn events | 600 |
| Snapshot account churn rate | 22.0% |
| Subscription churn rate | 9.7% |
| Accounts with churn-event history | 70.4% |
| Median time to first churn event | 2.7 months |
| 3-month average subscription retention | 97.2% |
| 12-month average subscription retention | 90.9% |
| MRR tied to churned subscriptions | $1,179,139 |
The dataset contains more than one way to view churn:
- Snapshot account churn from
accounts.churn_flag - Subscription churn from ended subscriptions in
subscriptions - Churn-event history from
churn_events
I kept these views separate because each one answers a different business question.
- Snapshot churn shows the current customer status.
- Subscription churn shows how often subscription records ended.
- Churn-event history shows churn behavior over time, including possible reactivation patterns.
This distinction makes the analysis more honest and useful for business decision-making.
The snapshot account churn rate is 22.0%, based on 110 churned accounts out of 500 accounts.
However, 70.4% of accounts have at least one churn event in the churn-events table. This suggests that churn behavior is event-driven and may include reactivations or repeated churn activity.
The subscription churn rate is 9.7%.
The median duration of churned subscriptions is only 1.4 months, which means a lot of churn risk happens early in the customer lifecycle.
This makes onboarding, first value, and early product adoption very important.
The largest churn reason is features, with 114 churn events.
Other major churn reasons include budget, support, unknown, competitor, and pricing.
This suggests that retention is not driven by only one issue. The business needs to improve product fit, support experience, and value communication together.
Accounts acquired from events show a snapshot churn rate of 30.2%.
Partner-sourced accounts have a lower snapshot churn rate of 14.6%.
This suggests that event-based acquisition may bring in customers with weaker fit or different expectations.
The DevTools segment has the highest snapshot churn rate at 31.0%.
This segment may need more technical onboarding, clearer documentation, and stronger product education.
Downgraded subscriptions have a churn rate of 11.5%, compared with 9.6% for non-downgraded subscriptions.
A downgrade should be treated as a retention risk signal, not just a plan change.
Because churned subscriptions have a short median duration, the business should focus on the first 30 days.
Recommended actions:
- Create a structured onboarding checklist
- Track first feature usage within the first week
- Send targeted guidance to inactive new customers
- Offer onboarding calls for high-value accounts
- Monitor new accounts that have low usage or many early support tickets
Since feature-related churn is the largest churn reason, customers need to experience the most valuable product features earlier.
Recommended actions:
- Identify the top features used by retained customers
- Build in-app tips for underused features
- Create use-case based onboarding flows
- Track feature adoption by account segment
- Trigger customer success outreach when adoption is low
Downgrades should trigger a retention workflow.
Recommended actions:
- Automatically flag downgraded subscriptions
- Ask customers why they downgraded
- Offer a right-sized plan recommendation
- Monitor usage after downgrade
- Follow up before renewal
Event-sourced accounts have the highest snapshot churn rate.
Recommended actions:
- Review event messaging and customer expectations
- Improve qualification before conversion
- Create event-specific onboarding
- Compare event leads against partner and organic leads
- Focus on attracting better-fit customers, not only more signups
Pricing, budget, and competitor-related reasons together represent a major part of churn.
Recommended actions:
- Build save-offer playbooks
- Offer annual discounts for customers with stable usage
- Improve value-based messaging
- Show ROI or productivity benefits more clearly
- Create flexible plans for budget-sensitive customers
Support is one of the major churn reasons.
Recommended actions:
- Monitor high-priority tickets for churn risk
- Improve first-response times for urgent issues
- Track satisfaction after ticket closure
- Route repeated support issues to customer success
- Create support dashboards by segment and plan tier
The business should monitor retention continuously instead of reviewing churn only after customers leave.
Recommended dashboard sections:
- Monthly churn rate
- Churn reasons
- Cohort retention
- Downgrade-risk accounts
- Support-risk accounts
- Feature adoption by segment
- Churn by acquisition channel
FUTURE_DS_02_customer_retention_churn_analysis/
├── analysis/
│ ├── analysis_summary.json
│ ├── account_churn_by_country.csv
│ ├── account_churn_by_industry.csv
│ ├── account_churn_by_referral_source.csv
│ ├── account_churn_by_trial_status.csv
│ ├── churn_reason_summary.csv
│ ├── cohort_retention_matrix.csv
│ ├── feature_usage_summary.csv
│ ├── monthly_churn_trends.csv
│ ├── retention_curve.csv
│ ├── retention_driver_churn_history_comparison.csv
│ ├── retention_driver_snapshot_comparison.csv
│ ├── subscription_churn_by_billing_frequency.csv
│ ├── subscription_churn_by_country.csv
│ ├── subscription_churn_by_industry.csv
│ └── subscription_churn_by_plan.csv
├── charts/
│ ├── churn_by_industry.png
│ ├── churn_by_referral_source.png
│ ├── churn_reasons.png
│ ├── cohort_retention_heatmap.png
│ ├── customer_lifetime_distribution.png
│ ├── monthly_churn_trend.png
│ └── retention_curve.png
├── dashboard/
│ ├── Customer_Retention_Churn_Dashboard.pdf
│ ├── Customer_Retention_Churn_Dashboard.png
│ └── index.html
├── data/
│ ├── README.md
│ ├── raw/
│ │ ├── README.md
│ │ ├── ravenstack_accounts.csv
│ │ ├── ravenstack_churn_events.csv
│ │ ├── ravenstack_feature_usage.csv
│ │ ├── ravenstack_subscriptions.csv
│ │ └── ravenstack_support_tickets.csv
│ └── processed/
│ ├── account_level_retention_dataset.csv
│ ├── cohort_retention_long.csv
│ ├── monthly_retention_trends.csv
│ ├── subscription_level_cleaned.csv
│ └── support_usage_account_metrics.csv
├── docs/
│ ├── analysis_report.md
│ ├── linkedin_post.md
│ └── submission_summary.md
├── reports/
│ └── Customer_Retention_Churn_Analysis_Report.pdf
├── src/
│ └── churn_retention_analysis.py
├── .gitignore
├── LICENSE_DATA_NOTE.md
├── README.md
├── README_Customer_Retention_Churn_Analysis.md
└── requirements.txt
Install dependencies:
pip install -r requirements.txtRun the analysis:
python src/churn_retention_analysis.pyThe script reads the raw CSV files from data/raw/, regenerates the processed datasets, creates analysis tables, and exports dashboard visuals.
This task helped me understand how retention analytics connects directly to business growth. I learned that churn is not only a number, but a signal that can come from product value, customer expectations, pricing pressure, support experience, and onboarding quality.
I also learned the importance of separating different churn definitions. A snapshot churn flag, a subscription churn flag, and a churn-event table can each tell a different part of the customer story.
Umuhire Gatesi Lyse
GitHub: Lyse777
