A high-performance file compression and decompression system implemented in C++ using the Huffman Encoding algorithm, designed to demonstrate efficient data encoding, file handling, and system-level programming.
This project implements a lossless compression technique that reduces file size by encoding characters based on their frequency of occurrence.
- Frequently occurring characters β shorter binary codes
- Rare characters β longer binary codes
The system supports:
- File compression into a binary format
- Accurate decompression back to the original content
- Display of compression statistics and Huffman codes
- β
Compress
.txtfiles into.bin - β
Decompress
.binfiles back to original text - β Generate optimal prefix-free Huffman codes
- β Preserve data integrity (lossless compression)
- π Compression ratio calculation
- π Space saving percentage
- π Character frequency analysis
- π Display of shortest and longest codes
- βοΈ Efficient use of STL (
priority_queue,map) - βοΈ Bit-level encoding logic
- βοΈ Custom binary file format
Count frequency of each character in the input file.
Insert characters into a priority queue based on frequency.
- Remove two lowest frequency nodes
- Merge into a new node
- Repeat until one root remains
Traverse the tree:
- Left β
0 - Right β
1
Replace each character with its binary code.
Reconstruct original text using the stored tree.
huffman-compressor-cpp/
β
βββ src/
β βββ main.cpp # Entry point
β βββ Huffman.cpp # Core algorithm
β βββ FileHandler.cpp # File I/O operations
β βββ utils.cpp # Helper functions
β
βββ data/
β βββ input.txt # Input file
β βββ out.bin # Compressed output
β βββ restored.txt # Decompressed output
β
βββ build/ # Build directory (ignored)
βββ CMakeLists.txt # Build configuration
βββ README.md
βββ .gitignore
clang++ -std=c++17 main.cpp Huffman.cpp FileHandler.cpp utils.cpp -o huffrm -rf build
mkdir build
cd build
cmake ..
make./huff compress data/input.txt data/out.bin./huff decompress data/out.bin data/restored.txt[+] Reading: data/input.txt
[+] File size: 40 bytes, unique chars: 16
--- Compression Stats ---
Original size : 40 bytes
Compressed size : 19 bytes
Compression ratio: 0.47
Space saving : 52.50%
-------------------------
[β] Compressed β data/out.bin
[β] Decompressed β data/restored.txt
| Metric | Description |
|---|---|
| Time Complexity | O(n log n) |
| Space Complexity | O(n) |
| Compression Efficiency | Depends on character frequency |
- High repetition β Better compression
- Random text β Lower compression efficiency
-
Data Structures:
- Binary Trees
- Priority Queue (Min Heap)
- Hash Maps
-
Algorithms:
- Greedy Algorithm (Huffman Encoding)
- Tree Traversal (DFS)
-
System Concepts:
- File I/O
- Binary encoding
- Memory management
- Works best for text files
- Compression efficiency reduces for highly random data
- Does not yet support large-scale file streaming
- π₯ GUI interface for ease of use
- π₯ Support for large files (GB scale)
- π₯ Combine with LZ77 (ZIP-like compression)
- π₯ Multi-threaded compression
- π₯ Drag-and-drop file support
This project demonstrates:
- Strong understanding of data structures & algorithms
- Practical implementation of compression systems
- Experience with file handling and system-level programming
Huffman Compression Tool (C++) Developed a file compression system using Huffman Encoding achieving up to 58% space reduction, with custom binary encoding and decoding pipeline.
Manish Nalumachu
This project is a strong demonstration of combining algorithmic thinking with real-world system implementation, making it highly valuable for software engineering interviews.