Skip to content

feat: add validation and review pipeline for extraction workflow#417

Open
Arijit429 wants to merge 7 commits intofireform-core:mainfrom
Arijit429:validation-review-pipeline
Open

feat: add validation and review pipeline for extraction workflow#417
Arijit429 wants to merge 7 commits intofireform-core:mainfrom
Arijit429:validation-review-pipeline

Conversation

@Arijit429
Copy link
Copy Markdown

🚀 Summary

This PR strengthens FireForm’s extraction workflow by introducing a dedicated validation and review pipeline for AI-generated structured outputs. The goal of this change is to improve extraction reliability, enable safer human-in-the-loop verification, and move the system closer to a production-ready workflow.


✨ What Changed

Added a dedicated validator module

Created a new utility file:

src/utils/extraction_validator.py

This introduces the ExtractionValidator class responsible for:

  • validating extracted incident fields
  • detecting missing / malformed values
  • generating confidence score
  • determining whether manual review is required

Integrated validation into extraction workflow

Updated:

src/file_manipulator.py

The extraction flow now includes a dedicated validation stage before PDF filling.

Updated flow

Frontend Input
   ↓
Structured LLM Extraction
   ↓
Fallback to Legacy Extraction (if needed)
   ↓
Validation Layer
   ↓
Confidence Scoring + requires_review
   ↓
PDF Filling
   ↓
Final Output

📌 Before

Previously, extracted data was passed directly into the PDF generation workflow after structured extraction / fallback.

This could allow:

  • incomplete fields
  • empty values
  • malformed extraction outputs
  • low-confidence reports

to move forward without sufficient validation.


✅ After

With this PR, every extracted output now passes through a validation pipeline that:

  • checks required fields
  • identifies missing data
  • assigns confidence score
  • flags incomplete outputs for manual review

This improves both reliability and safety of the generated reports.


🎯 Impact

  • Improves extraction consistency
  • Enables human-in-the-loop verification
  • Reduces incomplete report generation risk
  • Strengthens production readiness
  • Improves maintainability through modular validation logic

🧪 Testing

Tested locally using FastAPI Swagger routes.

Verified:

  • successful extraction flow
  • fallback extraction path
  • validation output logging
  • requires_review generation
  • successful PDF output creation

🔮 Future Scope

This validation layer also creates a strong foundation for future improvements such as:

  • field-level confidence scoring
  • schema-based validation
  • advanced NLP entity verification
  • route-level validation reporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant