Skip to content

lakshmanreddymv/EnterpriseDocumentRedactor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”’ EnterpriseDocumentRedactor

An AI-powered Android app that automatically detects and redacts Personally Identifiable Information (PII) from documents β€” 100% on-device, zero network calls, zero data exposure.

πŸ“± Portfolio Project by Lakshmana Reddy | Android Tech Lead | 12 years experience πŸ“ Pleasanton, CA | GitHub

Kotlin Android ML Kit Zero Network HIPAA GDPR Room Hilt


πŸ“Έ Screenshots

Home Screen Camera Scan Biometric Auth
History Screen Settings Screen Redacted PDF

✨ Features

πŸ€– AI & Detection

  • 100% On-Device AI β€” ML Kit Entity Extraction + OCR runs entirely offline. No internet permission in the manifest.
  • 3-Layer PII Detection β€” ML Kit β†’ Regex β†’ Context-aware analysis. Each layer catches what the previous misses.
  • 11 PII Types β€” Names, SSN, Credit Cards, Passports, Email, Phone, Address, DOB, Medical IDs, Financial accounts, Custom
  • Graceful Degradation β€” If ML Kit model unavailable, Regex + Context layers still run. Never shows blank results.

πŸ“„ Document Processing

  • True PDF Redaction β€” Pages rendered to Bitmap, black boxes drawn over PII bounding boxes, re-exported as image-based PDF. Copy-paste reveals nothing underneath.
  • Camera Scan β€” ML Kit Document Scanner for physical documents
  • File Picker β€” Open existing PDFs or images from device storage
  • User Review & Toggle β€” Red highlight overlay on document. Tap any item to keep or redact individually.
  • OOM Protection β€” Large images downsampled safely before OCR using inSampleSize guard.

πŸ”’ Enterprise Security

  • Biometric Authentication β€” Fingerprint/Face ID gates access to document history. Lock screen shown on History entry.
  • Auto-Lock Timeout β€” App locks after 1/5/15 minutes in background. Configurable in Settings. -1 = Never.
  • Secure File Deletion β€” Files overwritten with zeros before deletion. Prevents forensic recovery. HIPAA compliant.
  • Zero Network Calls β€” No INTERNET permission in manifest. StrictMode crashes debug build if network call sneaks in.
  • FLAG_SECURE β€” ReviewScreen + ResultScreen cannot be screenshotted or screen-recorded.
  • R8 Obfuscation β€” PII detection logic not readable via jadx in release builds.
  • No PII Logging β€” Only counts logged, never actual content.
  • No Cloud Backup β€” allowBackup=false in manifest.

πŸ“‹ History & Audit Trail

  • Complete Audit Log β€” Every redaction logged to Room DB with filename, timestamp, and item count.
  • Swipe to Delete β€” Swipe left on any history item to delete with confirmation dialog.
  • Delete All β€” Clear entire history with one tap + confirmation.
  • True Deletion β€” Removes both Room record AND PDF file from device storage.
  • Biometric Gate β€” History screen requires biometric auth before showing documents.

βš™οΈ Settings & Compliance

  • Auto-Delete Policy β€” Documents auto-deleted after 7/30/60/90 days. GDPR storage limitation compliant.
  • Configurable Auto-Lock β€” 1 min / 5 min / 15 min / Never. Persisted to SharedPreferences.
  • Retention Policy β€” Enforced on every app launch via RetentionPolicyManager.
  • Cache Cleanup β€” Temp files older than 24h auto-deleted on startup.

πŸ”„ Works Offline

  • Hospitals, courtrooms, secure government facilities β€” no WiFi required
  • All processing stays on-device forever after first ML Kit model download

πŸ—οΈ Architecture

Clean Architecture β€” 3 Strict Layers

UI Layer     β†’  knows only Domain (ViewModels + Use Cases)
Domain Layer β†’  knows nothing (pure Kotlin, zero Android imports)
Data Layer   β†’  knows only Domain (implements interfaces)
graph TB
subgraph UI["UI Layer β€” Jetpack Compose + MVVM"]
HS[HomeScreen] --> HVM[HomeViewModel]
RS[ReviewScreen] --> RVM[ReviewViewModel]
RES[ResultScreen] --> RESVM[ResultViewModel]
HIS[HistoryScreen] --> HIVM[HistoryViewModel]
SS[SettingsScreen] --> SVM[SettingsViewModel]
end
subgraph SECURITY["Security Layer"]
BAM[BiometricAuthManager]
ALM[AppLockManager]
RPM[RetentionPolicyManager]
end
subgraph DOMAIN["Domain Layer β€” Pure Kotlin"]
SUC[ScanDocumentUseCase]
RUC[RedactDocumentUseCase]
GHC[GetDocumentHistoryUseCase]
DR[DocumentRepository interface]
MOD[Document Β· RedactionItem Β· PiiType Β· RedactionResult]
end
subgraph DATA["Data Layer β€” Android + ML Kit"]
DRI[DocumentRepositoryImpl]
PD[PiiDetector β€” 3-layer ML]
DS[DocumentScanner β€” OCR]
PR[PdfRedactor β€” true redaction]
DB[Room Database]
end
subgraph DI["DI β€” Hilt"]
AM[AppModule]
end
HVM --> SUC
RVM --> RUC
HIVM --> GHC
SUC --> DR
RUC --> DR
GHC --> DR
DR -.->|implements| DRI
DRI --> PD
DRI --> DS
DRI --> PR
DRI --> DB
SVM --> ALM
SVM --> RPM
HIS --> BAM
AM -.->|provides all| DRI
style UI fill:#1a237e,color:#fff
style SECURITY fill:#b71c1c,color:#fff
style DOMAIN fill:#1b5e20,color:#fff
style DATA fill:#e65100,color:#fff
style DI fill:#4a148c,color:#fff
Loading

πŸ€– 3-Layer PII Detection Pipeline

sequenceDiagram
actor User
participant VM as ViewModel
participant DS as DocumentScanner
participant PD as PiiDetector
participant L1 as Layer 1: ML Kit
participant L2 as Layer 2: Regex
participant L3 as Layer 3: Context
participant PDF as PdfRedactor
participant DB as Room DB
User->>VM: Scan / Open PDF
VM->>DS: OCR β€” extract text + bounding boxes
DS-->>VM: Text blocks with pixel positions
VM->>PD: detect(ocrText, pageIndex)
PD->>L1: ML Kit Entity Extraction
L1-->>PD: Names, Address, Phone, Email, Money
PD->>L2: Regex patterns
L2-->>PD: SSN, Credit Card, Passport, MRN, DOB
PD->>L3: Context-aware sliding window
L3-->>PD: Account numbers, Patient IDs
PD->>PD: IoU merge β€” deduplicate overlaps
PD-->>VM: List of RedactionItems
VM-->>User: ReviewScreen β€” red highlights

User->>VM: Tap Redact
VM->>PDF: render β†’ black boxes β†’ export image PDF
PDF-->>VM: RedactionResult with outputPath
VM->>DB: saveDocument for audit trail
VM-->>User: ResultScreen β€” share clean PDF
Loading

πŸ”’ Security Architecture

graph LR
USER([User opens app]) --> BIOMETRIC{Biometric\nAuth}
BIOMETRIC -->|Pass| APP[Access Granted]
BIOMETRIC -->|Fail| LOCK[πŸ”’ Locked Screen]
APP --> TIMEOUT{Background\n> timeout?}
TIMEOUT -->|Yes| LOCK
TIMEOUT -->|No| CONTINUE[Continue Session]
APP --> DELETE{Delete\nDocument}
DELETE --> OVERWRITE[Overwrite file\nwith zeros]
OVERWRITE --> FILEDELETE[File.delete]
FILEDELETE --> DBDELETE[Room record deleted]
style LOCK fill:#b71c1c,color:#fff
style APP fill:#1b5e20,color:#fff
style OVERWRITE fill:#e65100,color:#fff
Loading

πŸ“‚ Project Structure

EnterpriseDocumentRedactor/
β”œβ”€β”€ domain/                              ← Pure Kotlin, zero Android imports
β”‚   β”œβ”€β”€ model/
β”‚   β”‚   β”œβ”€β”€ Document.kt
β”‚   β”‚   β”œβ”€β”€ RedactionItem.kt
β”‚   β”‚   β”œβ”€β”€ PiiType.kt                   # 11 PII types
β”‚   β”‚   β”œβ”€β”€ RedactionResult.kt
β”‚   β”‚   └── DocumentStatus.kt           # sealed class
β”‚   β”œβ”€β”€ repository/
β”‚   β”‚   └── DocumentRepository.kt        # Interface
β”‚   └── usecase/
β”‚       β”œβ”€β”€ ScanDocumentUseCase.kt
β”‚       β”œβ”€β”€ RedactDocumentUseCase.kt
β”‚       └── GetDocumentHistoryUseCase.kt
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ ml/
β”‚   β”‚   β”œβ”€β”€ PiiDetector.kt              # 3-layer detection + IoU merge
β”‚   β”‚   β”œβ”€β”€ DocumentScanner.kt          # ML Kit OCR + OOM guard
β”‚   β”‚   └── ModelDownloadHelper.kt      # Download state + 30s timeout
β”‚   β”œβ”€β”€ pdf/
β”‚   β”‚   └── PdfRedactor.kt              # True redaction β€” removes text layer
β”‚   β”œβ”€β”€ local/
β”‚   β”‚   β”œβ”€β”€ DocumentDatabase.kt
β”‚   β”‚   β”œβ”€β”€ DocumentDao.kt
β”‚   β”‚   └── DocumentEntity.kt
β”‚   └── repository/
β”‚       └── DocumentRepositoryImpl.kt   # secureDelete() included
β”‚
β”œβ”€β”€ security/                           ← Enterprise security layer
β”‚   β”œβ”€β”€ BiometricAuthManager.kt         # Fingerprint/Face ID + status enum
β”‚   β”œβ”€β”€ AppLockManager.kt               # Auto-lock + configurable timeout
β”‚   └── RetentionPolicyManager.kt       # Auto-delete old documents
β”‚
β”œβ”€β”€ di/
β”‚   └── AppModule.kt
β”‚
β”œβ”€β”€ ui/
β”‚   β”œβ”€β”€ home/
β”‚   β”‚   β”œβ”€β”€ HomeScreen.kt               # Camera scan + file picker + settings icon
β”‚   β”‚   └── HomeViewModel.kt
β”‚   β”œβ”€β”€ review/
β”‚   β”‚   β”œβ”€β”€ ReviewScreen.kt             # PII highlights + toggle + category chips
β”‚   β”‚   β”œβ”€β”€ ReviewViewModel.kt
β”‚   β”‚   └── RedactionUiState.kt
β”‚   β”œβ”€β”€ result/
β”‚   β”‚   β”œβ”€β”€ ResultScreen.kt             # Stats + share redacted PDF
β”‚   β”‚   └── ResultViewModel.kt          # Extracted to own file
β”‚   β”œβ”€β”€ history/
β”‚   β”‚   β”œβ”€β”€ HistoryScreen.kt            # Swipe-to-delete + biometric gate
β”‚   β”‚   └── HistoryViewModel.kt
β”‚   β”œβ”€β”€ settings/
β”‚   β”‚   β”œβ”€β”€ SettingsScreen.kt           # Security + retention UI
β”‚   β”‚   └── SettingsViewModel.kt        # Persists to SharedPreferences
β”‚   └── components/
β”‚       β”œβ”€β”€ PiiHighlightOverlay.kt
β”‚       β”œβ”€β”€ CategorySummaryChip.kt
β”‚       └── RedactionSummaryCard.kt
β”‚
β”œβ”€β”€ EnterpriseDocumentRedactorApp.kt    # Lifecycle observer + retention + cache cleanup
└── MainActivity.kt                     # Lock screen + NavHost (5 routes)

πŸ› οΈ Tech Stack

Layer Technology
Language Kotlin 2.2.10
UI Jetpack Compose + Material3
Architecture Clean Architecture + MVVM + UDF
DI Hilt 2.59.1
Camera ML Kit Document Scanner 16.0.0-beta4
OCR ML Kit Text Recognition 16.0.0
PII Detection ML Kit Entity Extraction 16.0.0-beta5
PDF Redaction Android PdfRenderer + PdfDocument API
Database Room 2.7.1
Biometric AndroidX Biometric 1.1.0
Image Loading Coil 2.7.0
Navigation Navigation Compose 2.8.9
Async Coroutines + StateFlow + SharedFlow
Build AGP 9.x, KSP 2.2.10-2.0.2, compileSdk 36

βš™οΈ Setup

Prerequisites

  • Android Studio Hedgehog or newer
  • Android device/emulator with Google Play Services (API 26+)
  • No API keys required β€” 100% on-device ML Kit

Clone & Run

git clone https://github.com/lakshmanreddymv-bot/EnterpriseDocumentRedactor.git
cd EnterpriseDocumentRedactor
./gradlew assembleDebug

Important: Use a Google Play emulator (not plain AOSP). After first launch ML Kit model downloads once β€” then works fully offline forever.


πŸ“‹ Permissions

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.USE_BIOMETRIC" />
<uses-permission android:name="android.permission.USE_FINGERPRINT" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE"
    android:maxSdkVersion="32" />
<uses-permission android:name="android.permission.READ_MEDIA_IMAGES" />
<!-- NO INTERNET PERMISSION β€” by design -->

πŸ”’ Security Architecture

Feature Implementation Standard
Zero network No INTERNET permission in manifest HIPAA, GDPR
Network verification StrictMode crashes on any network call in debug Dev safety
Screen protection FLAG_SECURE on Review + Result screens HIPAA
Biometric lock AndroidX BiometricPrompt β€” BIOMETRIC_STRONG HIPAA access control
Auto-lock timeout Configurable: 1/5/15 min / Never, persisted to prefs HIPAA
Secure deletion Overwrite with zeros (64KB chunks) β†’ File.delete() HIPAA forensics
DB backup disabled allowBackup=false in manifest GDPR
Release obfuscation R8 minification enabled Security
No PII logging Counts only, never text content HIPAA
Data retention Auto-delete after 7/30/60/90 days on launch GDPR Art.5
Audit trail Timestamped Room DB log per redaction HIPAA audit

πŸ§ͺ PII Detection β€” 11 Types Across 3 Layers

Layer Method Detects
Layer 1 ML Kit Entity Extraction πŸ‘€ Name, πŸ“ Address, πŸ“ž Phone, πŸ“§ Email, πŸ’° Financial
Layer 2 Precompiled Regex πŸ”’ SSN, πŸ’³ Credit Card, πŸ›‚ Passport, πŸ₯ MRN, πŸ“… DOB
Layer 3 Context-aware (60-char window) πŸ’° Account numbers, πŸ₯ Patient IDs

Overlap resolution: IoU > 0.3 merges duplicates. Highest confidence wins.

Graceful degradation: Layers 2+3 always run β€” even without Play Services.


πŸ“± Real-World Use Cases

Legal Firm β€” Discovery Documents

Input:  20-page contract
Found:  Name, SSN, Email, Account number, Phone
Result: Clean PDF β†’ opposing counsel β€” GDPR compliant

Hospital β€” Patient De-identification

Input:  Patient intake form
Found:  Name, DOB, MRN, Insurance policy, Email
Result: Anonymous record β†’ research team β€” HIPAA Safe Harbor

Personal β€” Driving Licence

Input:  Driving licence scan
Found:  Full name, DOB, Address, Licence number
Result: Safe to email to insurance β€” only photo visible

Bank β€” Audit Preparation

Input:  Loan application
Found:  Credit card, SSN, Account number
Result: PCI-DSS compliant version β†’ auditor

πŸ› Issues Faced & Fixed

# Problem Root Cause Fix
1 ML Kit "Something went wrong" Plain AOSP emulator, no Play Services Switch to Google Play emulator + ModelDownloadHelper fallback
2 OOM on multi-page PDF 10 pages Γ— 7.3MB bitmaps never recycled bitmap.recycle() after OCR + onCleared() in ReviewViewModel
3 PdfDocument native leak close() only on success path try-finally around pdfDoc.close()
4 Coroutine cancellation broken CancellationException swallowed in catch(Exception) Rethrow CancellationException first
5 ClassCastException in Compose context as Activity unsafe cast findActivity() extension walking ContextWrapper chain
6 Gradle version conflicts AGP 9.x + Kotlin 2.2.10 incompatibilities KSP 2.0.2, Hilt 2.59.1, Room 2.7.1, compileSdk 36
7 R8 disabled in release isMinifyEnabled = false Enabled R8 + ProGuard rules for ML Kit/Hilt/Room
8 Settings not persisting across launches No SharedPreferences β€” in-memory only SettingsViewModel with full SharedPreferences persistence
9 Large image OOM in scanImage() No inSampleSize on BitmapFactory calculateInSampleSize() guard + RGB_565 config before decode
10 Encrypted PDF cryptic crash SecurityException not caught specifically Catch SecurityException β†’ "This PDF is password-protected" message

πŸ—ΊοΈ Roadmap

  • SQLCipher β€” AES-256 Room database encryption
  • Tamper-proof audit log with SHA-256 hashing
  • Multi-language PII detection (Spanish, French, German)
  • Batch document processing
  • Password-protected PDF support
  • Export compliance certificate PDF
  • Custom PII rules β€” user-defined regex patterns

🀝 AI Android Portfolio

# Project Status Description
1 MySampleApplication-AI βœ… Complete AI Natural Language Search β€” Gemini API
2 FakeProductDetector βœ… Complete Dual-AI authentication β€” Gemini + Claude
3 EnterpriseDocumentRedactor βœ… Complete On-device PII redaction β€” 100% offline
4 Coming Soon πŸ”¨ Planning β€”

πŸ“„ License

MIT License β€” Copyright (c) 2026 Lakshmana Reddy


πŸ‘¨β€πŸ’» Author

Lakshmana Reddy Android Tech Lead | 12 years experience πŸ“ Pleasanton, CA πŸ”— GitHub


Built with ❀️ and on-device AI β€” because some data should never leave your device

About

100% on-device Android app that auto-detects and redacts PII from documents using ML Kit. Zero network calls. HIPAA/GDPR ready.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages