📦 Huffman Compressor (C++)

A high-performance file compression and decompression system implemented in C++ using the Huffman Encoding algorithm, designed to demonstrate efficient data encoding, file handling, and system-level programming.

1️⃣ Overview

This project implements a lossless compression technique that reduces file size by encoding characters based on their frequency of occurrence.

Frequently occurring characters → shorter binary codes
Rare characters → longer binary codes

The system supports:

File compression into a binary format
Accurate decompression back to the original content
Display of compression statistics and Huffman codes

2️⃣ Key Features

2.1 Core Functionalities

✅ Compress .txt files into .bin
✅ Decompress .bin files back to original text
✅ Generate optimal prefix-free Huffman codes
✅ Preserve data integrity (lossless compression)

2.2 Analytics & Debugging

📊 Compression ratio calculation
📊 Space saving percentage
📊 Character frequency analysis
📊 Display of shortest and longest codes

2.3 System-Level Design

⚙️ Efficient use of STL (priority_queue, map)
⚙️ Bit-level encoding logic
⚙️ Custom binary file format

3️⃣ How Huffman Coding Works

Step 1: Frequency Calculation

Count frequency of each character in the input file.

Step 2: Build Min Heap

Insert characters into a priority queue based on frequency.

Step 3: Construct Huffman Tree

Remove two lowest frequency nodes
Merge into a new node
Repeat until one root remains

Step 4: Generate Codes

Traverse the tree:

Left → 0
Right → 1

Step 5: Encode File

Replace each character with its binary code.

Step 6: Decode File

Reconstruct original text using the stored tree.

4️⃣ Project Structure

huffman-compressor-cpp/
│
├── src/
│   ├── main.cpp            # Entry point
│   ├── Huffman.cpp         # Core algorithm
│   ├── FileHandler.cpp     # File I/O operations
│   └── utils.cpp           # Helper functions
│
├── data/
│   ├── input.txt           # Input file
│   ├── out.bin             # Compressed output
│   └── restored.txt        # Decompressed output
│
├── build/                  # Build directory (ignored)
├── CMakeLists.txt          # Build configuration
├── README.md
└── .gitignore

5️⃣ Build Instructions

5.1 Using g++ / clang++

clang++ -std=c++17 main.cpp Huffman.cpp FileHandler.cpp utils.cpp -o huff

5.2 Using CMake (Recommended)

rm -rf build
mkdir build
cd build
cmake ..
make

6️⃣ Usage

6.1 Compress File

./huff compress data/input.txt data/out.bin

6.2 Decompress File

./huff decompress data/out.bin data/restored.txt

7️⃣ Sample Execution

[+] Reading: data/input.txt
[+] File size: 40 bytes, unique chars: 16

--- Compression Stats ---
Original size   : 40 bytes
Compressed size : 19 bytes
Compression ratio: 0.47
Space saving     : 52.50%
-------------------------

[✓] Compressed → data/out.bin
[✓] Decompressed → data/restored.txt

8️⃣ Performance Analysis

Metric	Description
Time Complexity	O(n log n)
Space Complexity	O(n)
Compression Efficiency	Depends on character frequency

Observations:

High repetition → Better compression
Random text → Lower compression efficiency

9️⃣ Technical Concepts Used

Data Structures:
- Binary Trees
- Priority Queue (Min Heap)
- Hash Maps
Algorithms:
- Greedy Algorithm (Huffman Encoding)
- Tree Traversal (DFS)
System Concepts:
- File I/O
- Binary encoding
- Memory management

🔟 Limitations

Works best for text files
Compression efficiency reduces for highly random data
Does not yet support large-scale file streaming

1️⃣1️⃣ Future Improvements

🔥 GUI interface for ease of use
🔥 Support for large files (GB scale)
🔥 Combine with LZ77 (ZIP-like compression)
🔥 Multi-threaded compression
🔥 Drag-and-drop file support

1️⃣2️⃣ Why This Project Matters

This project demonstrates:

Strong understanding of data structures & algorithms
Practical implementation of compression systems
Experience with file handling and system-level programming

1️⃣3️⃣ Resume Description

Huffman Compression Tool (C++) Developed a file compression system using Huffman Encoding achieving up to 58% space reduction, with custom binary encoding and decoding pipeline.

1️⃣4️⃣ Author

Manish Nalumachu

⭐ Final Note

This project is a strong demonstration of combining algorithmic thinking with real-world system implementation, making it highly valuable for software engineering interviews.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
FileHandler.cpp		FileHandler.cpp
FileHandler.h		FileHandler.h
Huffman.cpp		Huffman.cpp
Huffman.h		Huffman.h
README.md		README.md
huff		huff
main.cpp		main.cpp
utils.cpp		utils.cpp
utils.h		utils.h

Folders and files

Latest commit

History

Repository files navigation

📦 Huffman Compressor (C++)

1️⃣ Overview

2️⃣ Key Features

2.1 Core Functionalities

2.2 Analytics & Debugging

2.3 System-Level Design

3️⃣ How Huffman Coding Works

Step 1: Frequency Calculation

Step 2: Build Min Heap

Step 3: Construct Huffman Tree

Step 4: Generate Codes

Step 5: Encode File

Step 6: Decode File

4️⃣ Project Structure

5️⃣ Build Instructions

5.1 Using g++ / clang++

5.2 Using CMake (Recommended)

6️⃣ Usage

6.1 Compress File

6.2 Decompress File

7️⃣ Sample Execution

8️⃣ Performance Analysis

Observations:

9️⃣ Technical Concepts Used

🔟 Limitations

1️⃣1️⃣ Future Improvements

1️⃣2️⃣ Why This Project Matters

1️⃣3️⃣ Resume Description

1️⃣4️⃣ Author

⭐ Final Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages