Skip to content
Open
292 changes: 292 additions & 0 deletions .kiro/specs/photo-expense-creation/design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
# Design Document

## Overview

The photo-based expense creation feature integrates AI-powered image processing into FinFlow's existing transaction creation workflow. Users can capture or upload photos of receipts, invoices, or bank statements, and the system will automatically extract transaction details using Google's Gemini AI model. The extracted data is then presented in the familiar transaction form for review and editing before saving.

This feature leverages the existing transaction schema and UI components while adding new capabilities for image processing, AI integration, and enhanced user experience.

## Architecture

### High-Level Flow
1. **Image Capture/Upload**: User selects photo option from transaction creation flow
2. **Client Upload**: Photo is uploaded to server endpoint via secure API
3. **Server Processing**: Server processes image using Google Gemini AI and returns structured data
4. **Client Review**: User reviews extracted data in familiar transaction form
5. **Save**: Transaction(s) saved using existing InstantDB transaction flow

### Client-Server Architecture
```
Client (React) Server (TanStack Start)
├── PhotoTransactionFlow ├── processReceiptPhoto (Server Action)
│ ├── PhotoCaptureStep │ ├── Authentication middleware
│ ├── PhotoProcessingStep │ ├── Image validation
│ └── PhotoReviewStep │ ├── Gemini AI processing
└── TransactionForm (existing) │ └── Data structuring
└── Return transactions array
```

### Server Action Design
```typescript
// Server Action: src/actions/process-receipt-photo.ts
export async function processReceiptPhoto(formData: FormData): Promise<{
success: boolean;
transactions: ExtractedTransaction[];
error?: string;
}>;

// Usage from client:
const result = await processReceiptPhoto(formData);
```

## Components and Interfaces

### Core Components

#### PhotoTransactionFlow
Main orchestrator component that manages the photo-to-transaction workflow.

```typescript
interface PhotoTransactionFlowProps {
onTransactionsExtracted: (transactions: ExtractedTransaction[]) => void;
onCancel: () => void;
}

interface PhotoTransactionFlowState {
step: 'capture' | 'processing' | 'review';
image: File | null;
extractedTransactions: ExtractedTransaction[];
error: string | null;
isProcessing: boolean;
}
```

#### PhotoCaptureStep
Handles image capture from camera or file selection.

```typescript
interface PhotoCaptureStepProps {
onImageSelected: (file: File) => void;
onCancel: () => void;
}
```

#### PhotoProcessingStep
Shows loading state while AI processes the image.

```typescript
interface PhotoProcessingStepProps {
image: File;
onProcessingComplete: (transactions: ExtractedTransaction[]) => void;
onError: (error: string) => void;
}
```

#### PhotoReviewStep
Displays extracted transactions for review and editing.

```typescript
interface PhotoReviewStepProps {
transactions: ExtractedTransaction[];
onTransactionsConfirmed: (transactions: ExtractedTransaction[]) => void;
onBack: () => void;
}
```

### Data Models

#### ExtractedTransaction
Matches existing transaction schema with confidence indicators.

```typescript
interface ExtractedTransaction {
name: string;
amount: number;
type: 'credit' | 'debit';
category: string;
transactionAt: string; // ISO date string
confidence: {
name: number;
amount: number;
type: number;
category: number;
date: number;
};
}
```

#### Server Action Types
Server action interfaces for photo processing.

```typescript
// Client-side service
interface PhotoProcessingService {
processPhoto(file: File): Promise<ExtractedTransaction[]>;
validateImage(file: File): boolean;
}

// Server Action types
interface ProcessReceiptPhotoResult {
success: boolean;
transactions: ExtractedTransaction[];
error?: string;
processingTime?: number;
}

// Server Action function signature
export async function processReceiptPhoto(
formData: FormData
): Promise<ProcessReceiptPhotoResult>;

// Server-side AI service (internal)
interface AIExtractionService {
extractTransactions(imageBuffer: Buffer): Promise<ExtractedTransaction[]>;
validateImageFormat(buffer: Buffer): boolean;
}
```

## Data Models

### Enhanced Transaction Schema
The existing transaction schema remains unchanged, but we add validation and mapping utilities:

```typescript
// Existing schema (unchanged)
transactions: i.entity({
amount: i.number(),
category: i.string(),
name: i.string(),
transactionAt: i.date().indexed(),
type: i.string().indexed(),
})

// New validation schema for AI extraction
const aiExtractionSchema = z.object({
name: z.string().check(z.minLength(1, "Name is required")),
amount: z.number().check(z.minimum(0.01, "Amount is required")),
type: z.enum(["credit", "debit"]),
category: z.string().check(z.minLength(1, "Category is required")),
transactionAt: z.string().check(z.minLength(1, "Transaction Date is required")),
});
```

### Category Integration
The AI will suggest categories based on user's saved preferences from Legend State:

```typescript
// User categories from src/lib/legend-state.ts
categories$ = {
credit: ["Income", "Investment", "Salary", "Other"],
debit: [
"Food & Dining",
"Transportation",
"Shopping",
"Entertainment",
"Bills & Utilities",
"Healthcare",
"Education",
"Travel",
"Other"
]
}
```

The AI prompt includes these common categories as guidance, and the client-side review step will validate against the user's actual saved categories. If the AI suggests a category not in the user's list, the client will either:
1. Map it to the closest existing category
2. Default to "Other"
3. Allow the user to select from their saved categories during review

### Image Processing Configuration
```typescript
interface ImageProcessingConfig {
maxFileSize: number; // 10MB
allowedTypes: string[]; // ['image/jpeg', 'image/png', 'image/webp']
maxDimensions: { width: number; height: number };
compressionQuality: number;
}
```

## Error Handling

### Error Types
```typescript
enum PhotoProcessingError {
INVALID_IMAGE = 'INVALID_IMAGE',
FILE_TOO_LARGE = 'FILE_TOO_LARGE',
UNSUPPORTED_FORMAT = 'UNSUPPORTED_FORMAT',
AI_SERVICE_ERROR = 'AI_SERVICE_ERROR',
NETWORK_ERROR = 'NETWORK_ERROR',
EXTRACTION_FAILED = 'EXTRACTION_FAILED',
NO_TRANSACTIONS_FOUND = 'NO_TRANSACTIONS_FOUND'
}
```

### Error Recovery Strategies
1. **Image Validation Errors**: Show clear message with format/size requirements
2. **AI Service Errors**: Offer retry option or fallback to manual entry
3. **Network Errors**: Queue for retry when connection restored (PWA offline support)
4. **Extraction Failures**: Allow manual correction of extracted data
5. **No Data Found**: Suggest retaking photo or manual entry

### User-Friendly Error Messages
```typescript
const errorMessages = {
[PhotoProcessingError.INVALID_IMAGE]: "Please select a valid image file",
[PhotoProcessingError.FILE_TOO_LARGE]: "Image file is too large. Please choose a smaller image (max 10MB)",
[PhotoProcessingError.AI_SERVICE_ERROR]: "Unable to process image. Please try again or enter details manually",
[PhotoProcessingError.NO_TRANSACTIONS_FOUND]: "No transaction details found in image. Please try a clearer photo or enter details manually"
};
```

## Testing Strategy

### Unit Testing
- **AIExtractionService**: Mock Gemini API responses, test data mapping
- **PhotoProcessingService**: Test image validation, error handling
- **DataMappingService**: Test conversion between AI response and transaction schema
- **Component Logic**: Test state management, user interactions

### Integration Testing
- **Photo Capture Flow**: Test camera access, file selection
- **AI Processing Pipeline**: Test end-to-end image processing with sample receipts
- **Transaction Creation**: Test integration with existing transaction creation flow
- **Error Scenarios**: Test various failure modes and recovery

### E2E Testing
- **Complete Photo Flow**: Capture photo → process → review → save
- **Multiple Transactions**: Test bank statement with multiple entries
- **Offline Behavior**: Test queuing and retry mechanisms
- **Cross-Device**: Test PWA behavior on mobile and desktop

### Test Data
- Sample receipt images (various formats, qualities)
- Bank statement images (single and multiple transactions)
- Edge cases (blurry images, foreign languages, unusual formats)
- Mock AI responses for consistent testing

## Implementation Considerations

### Performance Optimizations
1. **Image Compression**: Compress images before sending to AI service
2. **Caching**: Cache AI responses for identical images
3. **Progressive Loading**: Show immediate feedback during processing
4. **Background Processing**: Use Web Workers for image processing

### Security & Privacy
1. **Server-Side Processing**: AI API keys and processing happen securely on server
2. **Temporary Storage**: Images processed in memory, not stored permanently on server
3. **Data Sanitization**: Server validates and sanitizes all AI-extracted data
4. **Upload Security**: File type validation, size limits, and secure multipart handling
5. **User Consent**: Clear messaging about AI processing and data handling

### Accessibility
1. **Camera Access**: Graceful fallback if camera unavailable
2. **Screen Readers**: Proper ARIA labels for all photo flow steps
3. **Keyboard Navigation**: Full keyboard support for photo capture
4. **Visual Indicators**: Clear progress and status indicators

### PWA Integration
1. **Offline Queuing**: Queue photos for processing when offline
2. **Service Worker**: Cache AI processing logic where possible
3. **Native Feel**: Use device camera APIs for native experience
4. **Background Sync**: Process queued photos when connection restored
58 changes: 58 additions & 0 deletions .kiro/specs/photo-expense-creation/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Requirements Document

## Introduction

This feature enables users to create expense transactions by taking or uploading photos of receipts, invoices, or bank statements. The system will use AI to automatically extract transaction details from the image, reducing manual data entry and improving the user experience for expense tracking.

## Requirements

### Requirement 1

**User Story:** As a FinFlow user, I want to create expenses by taking a photo of a receipt, so that I can quickly log transactions without manual typing.

#### Acceptance Criteria

1. WHEN a user accesses the expense creation flow THEN the system SHALL provide an option to "Add from Photo"
2. WHEN a user selects "Add from Photo" THEN the system SHALL allow them to either take a new photo or select from their device gallery
3. WHEN a user captures or selects an image THEN the system SHALL process the image and extract transaction details automatically
4. WHEN the AI processing is complete THEN the system SHALL display the extracted transaction data in list view like in transactions page
5. WHEN the extracted data is displayed THEN the user SHALL be able to review before saving
6. WHEN the extracted data is displayed THEN the user SHALL be able to click on a transaction and modify
7. WHEN the user is satisfied with the extracted data THEN the user SHALL be able to save the transaction
8. WHEN the user saves the transaction THEN the system SHALL create a new transaction record in the database


### Requirement 2

**User Story:** As a FinFlow user, I want the system to accurately extract transaction information from receipt images, so that I don't have to manually enter all the details.

#### Acceptance Criteria

1. WHEN an image contains a receipt THEN the system SHALL extract the merchant name, amount, date, and suggest an appropriate category
2. WHEN multiple transactions are detected in a single image THEN the system SHALL present all transactions for individual review
3. WHEN the image quality is poor or unreadable THEN the system SHALL provide a clear error message and suggest retaking the photo
4. WHEN transaction details are extracted THEN the system SHALL map them to the existing transaction schema (name, amount, type, category, transactionAt)
5. IF the AI cannot determine a field with confidence THEN the system SHALL leave that field empty for manual entry

### Requirement 3

**User Story:** As a FinFlow user, I want clear feedback during photo processing, so that I understand what's happening and can take appropriate action if needed.

#### Acceptance Criteria

1. WHEN photo processing begins THEN the system SHALL display a loading indicator with progress information
2. WHEN processing fails THEN the system SHALL provide a clear error message with suggested next steps
3. WHEN processing succeeds THEN the system SHALL smoothly transition to the transaction review screen
4. WHEN the user navigates away during processing THEN the system SHALL handle the interruption gracefully

### Requirement 4

**User Story:** As a FinFlow user, I want clear feedback when photo processing is unavailable, so that I understand when I need an internet connection.

#### Acceptance Criteria

1. WHEN the user is offline THEN the system SHALL disable the "Add from Photo" option
2. WHEN the photo option is disabled THEN the system SHALL show a clear message that internet is required
3. WHEN connectivity is restored THEN the system SHALL automatically enable the photo option
4. WHEN processing fails due to network issues THEN the system SHALL suggest checking internet connection
5. WHEN the user attempts photo processing offline THEN the system SHALL gracefully redirect to manual entry
Loading