FinCloud ☁️

FinCloud is a Python-based system designed to process and analyze financial documents such as salary slips, utility bills, bank statements, and more. It leverages OCR, natural language processing (NLP), and structured data extraction to provide insights and generate reports dynamically.

Features

OCR (Optical Character Recognition):
- Extracts structured and unstructured data from scanned images (e.g., salary slips, invoices).
Document Preprocessing:
- Image deskewing, noise removal, adaptive thresholding for better OCR accuracy.
Financial Insights:
- Extracts key metrics such as earnings, deductions, and net salary.
- Classifies expenses into categories like utilities, transportation, etc.
Multi-Document Support:
- Processes salary slips, bank statements, utility bills, Form 16, and more.
Accuracy Analysis:
- Calculates field extraction rates and internal consistency.
Data Storage and Visualization:
- Saves extracted data in structured formats (CSV/JSON) for seamless integration with analytics tools like Power BI.

Technologies Used

Programming Language: Python
Libraries:
- pytesseract: For OCR functionality.
- OpenCV: For image preprocessing and enhancement.
- spacy: For NLP and text analysis.
- pandas: For data manipulation and storage.
- matplotlib: For visualizing trends and extracting insights.
- re and dateutil: For regex-based parsing and date handling.

Installation

Clone the repository:

git clone https://github.com/your_username/financial-document-intelligence.git
cd financial-document-intelligence

Install the required dependencies:
```
pip install -r requirements.txt
```
Ensure Tesseract OCR is installed:
- Download and install Tesseract OCR from here.
- Update the pytesseract.pytesseract.tesseract_cmd variable in the code with the correct path on your system.
Download the English NLP model:
```
python -m spacy download en_core_web_sm
```

Dataset

-https://www.kaggle.com/datasets/mehaksingal/personal-financial-dataset-for-india

How to Use

Predefine Document Folders
- Organize your financial documents in folders corresponding to their types (e.g., salary slips, utility bills, etc.). Update the document_paths variable in the code with the folder paths.
Run the Main Script
- Use the following command to process documents:
```
python main.py
```
View Extracted Insights
- After successful execution, all extracted data will be saved in the processed_results folder in CSV and JSON formats.
Analyze with Power BI
- To import the data into Power BI:
  1. Open Power BI Desktop.
  2. Select Get Data > Text/CSV.
  3. Choose the processed_results/extracted_data.csv file.
  4. Build visualizations using the imported data.

Folder Structure

financial-document-intelligence/
├── README.md            # Documentation
├── requirements.txt     # Dependencies
├── main.py              # Main processing script
├── salary_slip_processor.py # Salary slip-specific logic
├── processed_results/   # Output directory for results
├── uploaded_files/      # Temporary file storage
├── utils/               # Utility functions and reusable components
└── tests/               # Unit tests for document processing

Sample Flow

Input:
- Upload a sample salary slip or utility bill (e.g., salary_slip.jpg).
Processing:
- The system extracts text using OCR and parses key information (e.g., company name, salary amounts, deductions).

Output:

Saves structured data in a CSV file:

file_name,document_type,company_name,employee_name,period,total_amount,processed_date
salary_slip.jpg,Salary Slip,TechCorp,John Doe,March 2025,65000,2025-04-12

Analytics:
- Visualize spending patterns and financial summaries in analytics tools.

Sample Image

-

Future Enhancements

Add support for more document types like purchase receipts and tax forms.
Improve OCR accuracy using advanced models like Google Vision or AWS Textract.
Add a web-based GUI for file uploads and analytics.
Automate recurring document ingestion using cloud services.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Feel free to customize the README to fit your project’s specific requirements! Let me know if you'd like me to assist in automating README generation or adjusting content.

Citations: [1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/62789894/1bb44bf1-6a95-4dd5-9aeb-28d91b9b8e3b/paste-1.txt

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Power Point Presentation		Power Point Presentation
README.md		README.md
hmmweareballin.html		hmmweareballin.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinCloud ☁️

Table of Contents

Features

Technologies Used

Installation

Dataset

How to Use

Folder Structure

Sample Flow

Sample Image

Future Enhancements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinCloud ☁️

Table of Contents

Features

Technologies Used

Installation

Dataset

How to Use

Folder Structure

Sample Flow

Sample Image

Future Enhancements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages