Financial Data Processing & Web Data Collection Suite

A comprehensive dual-module solution for automated financial data extraction and manual web marketplace data collection. This project demonstrates expertise in shell scripting for API integration, Python development for data management, and practical approaches to structured data handling in environments where traditional web scraping is constrained.

The suite addresses real-world challenges in data acquisition from diverse sources, employing bash scripting with curl for RESTful API consumption, Python for interactive CLI applications, and multi-format data serialization (JSON, CSV, TSV) for downstream analysis and database ingestion.

Key Technical Learnings:

RESTful API consumption and data transformation using shell utilities (curl, awk, sed)
Python-based interactive CLI development with structured data output
Multi-format data serialization strategies for analytical workflows
Error handling and data validation in production-grade shell scripts
File I/O operations with proper encoding management for international character sets

Tech Stack

Project Modules

1. AMFI NAV Data Extractor

A bash-based automated pipeline for retrieving and processing daily Net Asset Value (NAV) data from the Association of Mutual Funds in India (AMFI) API. Implements ETL (Extract, Transform, Load) principles to convert semicolon-delimited raw data into analysis-ready tab-separated values.

Technical Highlights:

HTTP request handling via curl with error management
Stream processing with AWK for pattern matching and field extraction
Regular expression-based data validation
TSV output optimized for database ingestion and spreadsheet analysis

2. OLX Marketplace Data Collector

A Python-based interactive command-line interface (CLI) for structured collection and storage of e-commerce listing metadata from OLX India. Designed for scenarios where DOM parsing restrictions or anti-bot measures necessitate manual data entry.

Technical Highlights:

Object-oriented data modeling with Python dictionaries
Dual-format export (JSON with ISO 8601 timestamps, CSV with DictWriter)
UTF-8 encoding support for multi-language marketplace data
Interactive user prompts with input validation and flow control

Installation & Dependencies

Prerequisites

# For AMFI NAV Data Extractor (Linux/macOS/WSL)
- bash (version 4.0+)
- curl
- awk (GNU awk recommended)

# For OLX Data Collector
- Python 3.6 or higher

Setup

# Clone the repository
git clone <repository-url>
cd temp_assignment

# Verify bash script permissions
chmod +x data_extractor/data_extractor.sh

# Python dependencies are part of standard library (json, csv, datetime)
# No additional pip installations required

Usage

AMFI NAV Data Extractor

Navigate to the data extraction module and execute the shell script:

cd data_extractor
./data_extractor.sh

Output:

amfi_nav_data.tsv - Tab-separated file containing scheme names and NAV values
Console displays record count and sample data preview

Use Cases:

Daily mutual fund performance tracking
Historical NAV database population
Financial analysis and reporting pipelines
Integration with data visualization tools

OLX Data Collector

Navigate to the web scraping module and run the Python script:

cd "olx scrapper"
python web_scraper.py

Interactive Menu Options:

Manual Entry - Enter listing data interactively via CLI prompts
Load Sample Data - Generate sample output files for testing
Instructions - Display browser console snippet for advanced users

Output Files:

olx_manual_data_YYYYMMDD_HHMMSS.json - Structured JSON with metadata and ISO timestamps
olx_manual_data_YYYYMMDD_HHMMSS.csv - Comma-separated format for spreadsheet import

Data Fields Collected:

Title, Price, Location, Date, URL, Description

Use Cases:

Market research and competitive analysis
Price tracking and trend analysis
Database population for e-commerce analytics
Training datasets for machine learning models

Project Structure

temp_assignment/
├── data_extractor/
│   ├── data_extractor.sh       # Bash ETL pipeline for AMFI data
│   ├── amfi_nav_data.tsv       # Generated NAV dataset
│   └── readme.txt              # Module-specific documentation
├── olx scrapper/
│   ├── web_scraper.py          # Python CLI data collector
│   ├── olx_manual_data_*.json  # Generated JSON datasets
│   ├── olx_manual_data_*.csv   # Generated CSV datasets
│   └── readme.txt              # Module-specific documentation
└── README.md                    # This file

Technical Architecture

Data Pipeline Flow - AMFI Module:

AMFI API → HTTP GET (curl) → Raw Semicolon-Delimited Data → 
AWK Stream Processing → Field Extraction & Validation → 
TSV Output → File System Storage

Data Collection Flow - OLX Module:

User Input (CLI) → Python Dictionary Structures → 
Validation & Storage → Dual Serialization (JSON + CSV) → 
Timestamped File Output

Future Enhancements

Database integration (PostgreSQL/MySQL) for persistent storage
RESTful API wrapper for programmatic data access
Automated scheduling with cron jobs for daily NAV updates
Enhanced data validation with pandas DataFrames
Web dashboard for data visualization using Flask/FastAPI
Selenium-based automation for OLX data extraction (where permitted)

License

This project is available for portfolio and educational purposes.

Developed with focus on clean code architecture, data integrity, and production-ready error handling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Financial Data Processing & Web Data Collection Suite

Tech Stack

Project Modules

1. AMFI NAV Data Extractor

2. OLX Marketplace Data Collector

Installation & Dependencies

Prerequisites

Setup

Usage

AMFI NAV Data Extractor

OLX Data Collector

Project Structure

Technical Architecture

Future Enhancements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_extractor		data_extractor
olx scrapper		olx scrapper
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Financial Data Processing & Web Data Collection Suite

Tech Stack

Project Modules

1. AMFI NAV Data Extractor

2. OLX Marketplace Data Collector

Installation & Dependencies

Prerequisites

Setup

Usage

AMFI NAV Data Extractor

OLX Data Collector

Project Structure

Technical Architecture

Future Enhancements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages