A Machine Learning & Deep Learning system that predicts Test Case Coverage Percentage based on feature descriptions and test plans. This project helps QA teams and developers identify testing gaps early in the development cycle.
During software development, one critical question often arises too late: "Do we have enough test cases?"
Traditional approaches rely on:
- Manual code reviews (time-consuming)
- Post-deployment bug tracking (too late)
- Gut feeling from experienced QA engineers (not scalable)
This system solves that by:
- Analyzing feature descriptions and test plans before code is written
- Predicting coverage percentage (0-100%) instantly
- Identifying missing test scenarios automatically
- Providing domain-specific insights (Healthcare, Finance, E-commerce, etc.)
| Scenario | Without This Tool | With This Tool |
|---|---|---|
| Early Detection | Find gaps during QA phase | Find gaps during planning phase |
| Time Saved | 2-3 days of testing cycles | 30 seconds prediction |
| Cost | Fix bugs in production ($$$) | Prevent bugs before coding ($) |
| Coverage | Discover gaps through failures | Predict gaps proactively |
βββ app/ # Production-ready APIs
β βββ app.py # ML Model API (Gradient Boosting)
β βββ app_dl.py # DL Model API (LSTM PyTorch)
βββ artifacts/
β βββ ml/ # Trained ML models & plots
β βββ dl/ # Trained DL models & vocabulary
βββ data/
β βββ raw/ # Original dataset (1000 samples)
β βββ processed/ # Cleaned & balanced data
βββ notebooks/
β βββ ML.ipynb # Gradient Boosting experiments
β βββ DL.ipynb # LSTM training & tuning
βββ training/ # Automated training scripts
βββ requirements.txt # Python dependencies
We experimented with two approaches to see which performs better for this problem.
Why we tried this:
- Fast inference for real-time APIs
- Interpretable (can explain predictions)
- Works well with structured features
Architecture:
Feature Description + Test Cases
β
TF-IDF Vectorization (500 features)
β
Domain Encoding (one-hot, 5 features)
β
Engineered Features (6 features)
β
Gradient Boosting Regressor
β
Coverage Percentage (0-100)
Results:
- Test RΒ² Score: 0.641
- Mean Absolute Error: 6.21%
- Inference Time: 3-5ms
- Model Size: 2.5 MB
What we learned:
- TF-IDF captures keyword importance well (e.g., "authentication", "validation")
- Domain-specific features matter (Healthcare needs compliance tests)
- Number of test cases alone isn't enough - quality matters
- Feature engineering > raw text for this problem size
Why we tried this:
- Capture sequential patterns in text
- Learn word relationships automatically
- No manual feature engineering needed
Architecture:
Feature Description + Test Cases
β
Word Tokenization
β
Embedding Layer (96 dimensions)
β
LSTM Layer (192 hidden units)
β
Dense Layers (128 β 64 β 1)
β
Coverage Percentage (0-100)
Results:
- Test RΒ² Score: 0.6868
- Mean Absolute Error: 5.71%
- Inference Time: 150-360ms
- Model Size: 8 MB
- Parameters: 272,273
What we learned:
- LSTM captures context better ("test invalid password" vs "password test invalid")
- Embeddings learn semantic relationships (e.g., "authentication" β "login")
- Slower but more accurate (7% improvement in RΒ²)
- Needs more data to truly shine (1000 samples is borderline)
| Metric | Gradient Boosting | LSTM | Winner |
|---|---|---|---|
| Accuracy (RΒ²) | 0.641 | 0.6868 | π₯ LSTM (+7.1%) |
| Error (MAE) | 6.21% | 5.71% | π₯ LSTM (-0.5%) |
| Speed | 3-5ms | 150-360ms | π₯ GB (50x faster) |
| Model Size | 2.5 MB | 8 MB | π₯ GB (3x smaller) |
| Interpretability | High | Low | π₯ GB |
| Training Time | 2 minutes | 30 minutes | π₯ GB |
Conclusion:
- Use Gradient Boosting for production APIs (speed matters)
- Use LSTM for batch processing or when accuracy is critical
- Total Samples: 1,000 test scenarios
- Domains: 5 (Fintech, Healthcare, E-commerce, Social Media, Logistics)
- Samples per Domain: 200 (perfectly balanced)
- Coverage Range: 26.67% to 94.12%
| Domain | Samples | Avg Coverage | Min | Max | Characteristics |
|---|---|---|---|---|---|
| E-commerce | 200 | 62.60% | 26.67% | 93.33% | Cart, checkout, payments |
| Fintech | 200 | 61.89% | 26.67% | 94.12% | Transactions, security, compliance |
| Healthcare | 200 | 62.23% | 26.67% | 93.33% | HIPAA, patient data, prescriptions |
| Logistics | 200 | 63.49% | 26.67% | 93.33% | Tracking, routing, GPS |
| Social Media | 200 | 62.30% | 26.67% | 93.33% | Profiles, moderation, feeds |
What makes coverage high (>80%)?
- Comprehensive test scenarios (10+ cases)
- Negative test cases included ("invalid", "error", "failed")
- Security tests present ("authentication", "authorization")
- Edge cases covered ("boundary", "maximum", "minimum")
- Compliance checks ("HIPAA", "GDPR", "PCI-DSS")
What makes coverage low (<40%)?
- Few test cases (1-3 only)
- Only happy path testing
- No security tests
- No edge cases
- Missing compliance requirements
Below are 10 real examples from our training data, showing different coverage levels and why.
Feature Description:
Payment gateway integration for processing credit card transactions. System must
validate card details, process payments through third-party gateway, handle declined
transactions, implement retry logic for failed payments, store encrypted payment
tokens for future use, send email confirmations, and comply with PCI-DSS standards.
Transaction limits: $10,000 per transaction, $50,000 daily limit.
Test Cases:
β Test successful payment with valid card
β Test payment with expired card
β Test payment with insufficient funds
β Test payment exceeding transaction limit
β Test payment exceeding daily limit
β Test 3D Secure authentication flow
β Test card tokenization and storage
β Test payment retry mechanism
β Test declined transaction handling
β Test email confirmation delivery
β Test audit log creation for all transactions
Why Good Coverage (11 test cases):
- β Happy path (valid card)
- β Negative cases (expired, insufficient funds)
- β Boundary testing (transaction limits)
- β Security (3D Secure, tokenization)
- β Compliance (audit logs, PCI-DSS)
What's Still Missing:
- Concurrent payment handling
- Refund scenarios
- Currency conversion edge cases
Feature Description:
Electronic Health Record (EHR) access system for healthcare providers. Doctors and
nurses can view patient medical history, lab results, prescriptions, and treatment
plans. System must enforce role-based access control, log all PHI access with
timestamp and reason, support emergency break-glass access for critical situations,
mask sensitive data for unauthorized roles, comply with HIPAA requirements, and
auto-lock sessions after 15 minutes of inactivity.
Test Cases:
β Test authorized doctor access to patient records
β Test nurse access with limited permissions
β Test unauthorized access denial
β Test emergency break-glass access with audit trail
β Test data masking for non-authorized fields
β Test session timeout after 15 minutes
β Test PHI access logging
β Test patient consent verification
β Test access from multiple devices
Why Medium Coverage (9 test cases):
- β Role-based access (doctor, nurse)
- β Security (unauthorized access, session timeout)
- β Compliance (HIPAA, PHI logging)
- β Emergency scenarios (break-glass)
What's Missing:
- β Network failure scenarios
- β Concurrent access conflicts
- β Data export/backup tests
- β Password complexity enforcement
- β Multi-factor authentication
Feature Description:
Shopping cart and checkout functionality for online store. Users can add/remove
items, apply discount codes, select shipping methods, and complete purchase. Cart
should persist across sessions, calculate taxes based on location, validate inventory
availability, support guest checkout, handle concurrent modifications, and integrate
with payment gateway.
Test Cases:
β Test add single item to cart
β Test add multiple items to cart
β Test remove item from cart
β Test update item quantity
β Test apply valid discount code
β Test apply expired discount code
β Test apply invalid discount code
β Test cart persistence after logout
β Test guest checkout without registration
β Test inventory validation before checkout
β Test shipping cost calculation
β Test tax calculation based on zip code
β Test payment gateway integration
β Test order confirmation email
β Test concurrent cart modifications
Why Excellent Coverage (15 test cases):
- β CRUD operations (add, remove, update)
- β Positive & negative cases (valid/invalid/expired)
- β Edge cases (concurrent modifications)
- β Integration (payment gateway, email)
- β Business logic (taxes, shipping, inventory)
- β Session management (persistence, guest)
Comprehensive Testing = High Confidence!
Feature Description:
User profile management feature allowing users to update personal information, upload
profile picture, set privacy preferences, link social accounts, and manage notification
settings. Profile photos must be validated for size and format. Users can set profile
visibility to public, friends-only, or private.
Test Cases:
β Test update profile name
β Test upload valid profile picture
β Test upload oversized profile picture
β Test update email address
β Test update with duplicate email
β Test change privacy settings to public
β Test change privacy settings to private
β Test link Facebook account
Why Low Coverage (8 test cases):
- β Basic CRUD (update name, email)
- β Some validation (oversized photo)
β οΈ Limited edge cases
Critical Gaps:
- β No security tests (password change, 2FA)
- β No malicious upload tests (XSS, SQL injection)
- β No rate limiting tests
- β No data export/deletion (GDPR)
- β No notification settings tests
- β No concurrent update conflicts
Lesson: Basic functionality β Good coverage. Security matters!
Feature Description:
Real-time package tracking system with GPS integration. Customers can track package
location, view delivery status, receive SMS/email notifications, estimate delivery
time, and report issues. System must validate tracking numbers, handle multiple
packages per order, detect GPS anomalies, support geofencing alerts, and maintain
delivery history for 90 days.
Test Cases:
β Test track package with valid tracking number
β Test track package with invalid tracking number
β Test real-time GPS location update
β Test delivery status change notifications
β Test SMS notification delivery
β Test email notification delivery
β Test geofencing alert when package enters delivery zone
β Test GPS anomaly detection
β Test multiple packages in single order
β Test delivery time estimation
β Test customer issue reporting
β Test delivery history retrieval
β Test tracking number validation
β Test location privacy settings
Why Excellent Coverage (14 test cases):
- β Input validation (valid/invalid tracking)
- β Real-time features (GPS, status updates)
- β Notifications (SMS, email, geofencing)
- β Edge cases (anomalies, multiple packages)
- β Privacy (location settings)
- β Data retention (90-day history)
Comprehensive + Domain-specific = Great coverage!
Feature Description:
User login with email and password, support 2FA, account lockout after 5 attempts
Test Cases:
β Test valid login
β Test invalid password
β Test account lockout
β Test 2FA verification
Why Very Low Coverage (4 test cases):
β οΈ Minimal testing (only 4 cases)β οΈ Missing edge casesβ οΈ No security depth
Critical Gaps:
- β No unlock mechanism tests
- β No 2FA backup codes
- β No rate limiting on login attempts
- β No session management
- β No password reset flow
- β No brute force attack tests
- β No audit logging
Lesson: Security features need DEEP testing, not surface-level!
Feature Description:
Digital prescription management system for doctors to create, modify, and send
prescriptions to pharmacies. System must validate drug interactions, check patient
allergies, enforce dosage limits, require digital signature from authorized prescriber,
support e-prescribing to pharmacies, maintain prescription history, implement drug
formulary checks, and comply with DEA regulations for controlled substances.
Test Cases:
β Test create new prescription with valid drug
β Test create prescription with patient allergy conflict
β Test detect dangerous drug-drug interactions
β Test validate dosage within safe limits
β Test validate dosage exceeding maximum limit
β Test digital signature requirement enforcement
β Test send prescription to pharmacy via e-prescribe
β Test controlled substance prescription with DEA validation
β Test prescription modification with audit trail
β Test prescription cancellation
β Test view prescription history
β Test formulary check for insurance coverage
β Test prescription renewal workflow
β Test unauthorized prescriber access denial
β Test duplicate prescription detection
β Test prescription for pediatric patient with weight-based dosage
Why Exceptional Coverage (16 test cases):
- β Safety checks (allergies, interactions, dosage)
- β Compliance (DEA, digital signature, audit)
- β Business logic (formulary, insurance, renewal)
- β Security (authorization, duplicate detection)
- β Edge cases (pediatric, controlled substances)
- β CRUD operations (create, modify, cancel, view)
This is what COMPREHENSIVE testing looks like!
Healthcare = High risk = Thorough testing required
Feature Description:
Automated refund processing system for returns. Customers can request refunds within
30 days, upload return shipping proof, and receive refund to original payment method.
Test Cases:
β Test refund request within 30 days
β Test refund request after 30 days
β Test refund to credit card
β Test refund status tracking
Why Low Coverage (4 test cases):
β οΈ Only 4 test scenariosβ οΈ Happy path focused
Critical Gaps:
- β No partial refund tests
- β No file upload validation (shipping proof)
- β No concurrent refund requests
- β No fraud detection tests
- β No refund to different payment methods
- β No email notification tests
- β No refund failure scenarios
- β No cancellation of refund requests
Lesson: Even "simple" features have complexity!
Feature Description:
AI-powered content moderation system that automatically detects and flags inappropriate
content including hate speech, violence, nudity, and spam. System must scan text,
images, and videos, provide confidence scores, allow manual review by moderators,
support user appeals, implement rate limiting to prevent abuse, maintain moderation
logs, and comply with platform community guidelines. False positive rate must be
below 5%.
Test Cases:
β Test detection of hate speech in text post
β Test detection of violent imagery
β Test detection of nudity in uploaded photos
β Test detection of spam content
β Test detection of self-harm content
β Test false positive handling for legitimate content
β Test confidence score calculation
β Test manual moderator review queue
β Test user appeal submission
β Test appeal decision notification
β Test rate limiting for flagged users
β Test moderation action audit logs
β Test multi-language content moderation
β Test context-aware moderation decisions
β Test automated content removal for high-confidence violations
β Test temporary account suspension for repeat violations
β Test compliance with community guidelines
Why Excellent Coverage (17 test cases):
- β Multiple content types (text, image, video)
- β Multiple violation types (hate, violence, spam)
- β AI/ML validation (confidence scores, accuracy)
- β Human-in-the-loop (manual review, appeals)
- β System safeguards (rate limiting, logs)
- β Multi-language support
- β Compliance (guidelines, audit trails)
Complex AI system = Needs extensive testing!
Feature Description:
Automated driver assignment system that matches delivery orders with available drivers
based on location proximity, vehicle capacity, driver working hours, and priority
level. System must optimize routes, handle driver unavailability, support manual
override by dispatchers, track driver status in real-time, and maintain assignment
history.
Test Cases:
β Test assign order to nearest available driver
β Test assign order when no drivers available
β Test vehicle capacity validation before assignment
β Test driver working hours compliance
β Test high-priority order assignment
β Test route optimization after assignment
β Test driver unavailability handling
β Test manual override by dispatcher
β Test real-time driver status tracking
β Test assignment history logging
β Test reassignment after driver cancellation
β Test multiple orders to single driver
Why Good Coverage (12 test cases):
- β Algorithm logic (proximity, capacity, hours)
- β Edge cases (no drivers, unavailability)
- β Priority handling
- β Manual overrides
- β Real-time tracking
- β Audit trails (history)
Solid testing for an optimization algorithm!
For small datasets (1000 samples), good features beat complex models:
- TF-IDF captured keyword importance effectively
- Domain encoding was crucial (Healthcare β E-commerce)
- Simple counts (# of test cases) surprisingly predictive
The model learned that:
- "Test invalid password" is better than just "Test login"
- Security keywords β need more tests
- Healthcare/Finance β need compliance tests
- More test cases β better coverage (quality > quantity)
- LSTM performed better but not dramatically (7% improvement)
- With 10K+ samples, the gap would likely be larger
- For production with limited data, ML is more practical
Coverage correlates with:
- Number of test cases (r = 0.45)
- Presence of negative tests (r = 0.38)
- Security keywords (r = 0.32)
- Domain (Healthcare > Finance > E-commerce)
Coverage does NOT correlate with:
- Feature description length
- Average test case length
- Number of complex words
| Factor | Choose ML | Choose DL |
|---|---|---|
| Data size | < 5K samples | > 10K samples |
| Latency requirement | < 100ms | > 500ms OK |
| Infrastructure | CPU only | GPU available |
| Interpretability | Must explain | Black box OK |
| Accuracy requirement | 6% MAE acceptable | < 5% MAE needed |
# 1. Clone the repository
git clone https://github.com/yourusername/test-coverage-prediction.git
cd test-coverage-prediction
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Verify models exist
ls artifacts/ml/test_coverage_model_balanced.pkl
ls artifacts/dl/test_coverage_pytorch_working.pkl
Option 1: ML Model (Fast, Production-ready)
python app.py
# API runs on http://localhost:8001
# Docs: http://localhost:8001/docs
Option 2: DL Model (More Accurate)
python app_dl.py
# API runs on http://localhost:8000
# Docs: http://localhost:8000/docscurl -X POST "http://localhost:8001/predict" \
-H "Content-Type: application/json" \
-d '{
"feature_description": "User authentication system with email/password and 2FA support",
"input_test_cases": [
"Test valid login",
"Test invalid password",
"Test account lockout after 5 attempts",
"Test 2FA verification"
],
"domain": "security"
}'Expected Response:
{
"predicted_coverage": 45.8,
"status": "Fair",
"metadata": {
"model_version": "4.0-Balanced",
"prediction_time_ms": 3.64,
"timestamp": "2025-12-22T10:30:00Z"
}
}| Endpoint | Method | Description |
|---|---|---|
/ |
GET | API information |
/health |
GET | Health check |
/predict |
POST | Get coverage prediction |
/domains |
GET | List supported domains |
/model-info |
GET | Model metadata |
/docs |
GET | Interactive Swagger UI |
{
"feature_description": "string (10-5000 chars)",
"input_test_cases": ["string", "string", ...],
"domain": "security|compliance|healthcare|finance|other"
}{
"predicted_coverage": 65.5,
"status": "Good",
"metadata": {
"model_version": "4.0-Balanced",
"model_name": "Gradient Boosting (Balanced)",
"prediction_time_ms": 3.64,
"timestamp": "2025-12-22T10:30:00.123Z",
"num_features": 511
}
}| Coverage | Status | Meaning |
|---|---|---|
| < 40% | Poor | Major testing gaps |
| 40-60% | Fair | Needs improvement |
| 60-80% | Good | Solid coverage |
| > 80% | Excellent | Comprehensive testing |
| Domain | Keywords | Typical Coverage Needs |
|---|---|---|
| Finance/Fintech | payment, transaction, banking, currency | High (compliance, security) |
| Healthcare | patient, medical, prescription, HIPAA | Very High (regulatory) |
| E-commerce | cart, checkout, order, inventory | Medium-High (user experience) |
| Social Media | profile, post, comment, moderation | Medium (content safety) |
| Logistics | tracking, delivery, driver, route | Medium-High (reliability) |
| Security | authentication, authorization, encryption | Very High (critical) |
| Compliance | GDPR, audit, regulation | Very High (legal) |
# 1. Prepare your data in data/raw/
# Format: CSV with columns [domain, feature_description, input_test_cases, coverage_percentage]
# 2. Run Jupyter notebooks
jupyter notebook notebooks/
# 3. Open ML.ipynb for Gradient Boosting
# 4. Open DL.ipynb for LSTM
# Models will be saved to artifacts/βββ app/
β βββ app.py # ML API (Scikit-learn)
β βββ app_dl.py # DL API (PyTorch)
βββ artifacts/
β βββ ml/ # Trained ML models
β βββ dl/ # Trained DL models
βββ data/
β βββ raw/ # Original datasets
β βββ processed/ # Cleaned data
βββ notebooks/
β βββ ML.ipynb # ML experiments
β βββ DL.ipynb # DL experiments
βββ training/ # Training scripts
Before starting test case writing, get coverage estimation:
Feature: Payment gateway integration
Prediction: 55% coverage
Action: Add security tests, edge cases, compliance checks
During PR review, validate test completeness:
Feature: User registration
Current tests: 5
Prediction: 40% coverage (Fair)
Reviewer: "Add password validation and rate limiting tests"
Estimate testing effort:
Feature: Complex workflow
Prediction: 35% coverage
Conclusion: Allocate 2 more days for test case development
For regulated industries:
Feature: Patient record access (Healthcare)
Prediction: 75% coverage
Auditor: "Need HIPAA logging tests to reach 90%+"
Contributions welcome! Areas for improvement:
- Add more training data (target: 10K samples)
- Implement SHAP/LIME for interpretability
- Add Transformer models (BERT, RoBERTa)
- Build web dashboard (Streamlit/React)
- Add A/B testing framework
- Implement model drift detection
How to contribute:
- Fork the repo
- Create feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open Pull Request
MIT License - see LICENSE file for details.
Vivek Chary
- GitHub: @vivek-541
- Twitter: @VivekCharyA
- FastAPI team for excellent web framework
- PyTorch & Scikit-learn communities
- All contributors to open-source ML ecosystem
- QA professionals who inspired this project
Questions or feedback? Open an issue or reach out:
- GitHub Issues: Create Issue
- Email: vivekchary541@gmail.com
β If this project helps you, please star the repo!