Skip to content

manish08k/huffman-compressor-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“¦ Huffman Compressor (C++)

A high-performance file compression and decompression system implemented in C++ using the Huffman Encoding algorithm, designed to demonstrate efficient data encoding, file handling, and system-level programming.


1️⃣ Overview

This project implements a lossless compression technique that reduces file size by encoding characters based on their frequency of occurrence.

  • Frequently occurring characters β†’ shorter binary codes
  • Rare characters β†’ longer binary codes

The system supports:

  • File compression into a binary format
  • Accurate decompression back to the original content
  • Display of compression statistics and Huffman codes

2️⃣ Key Features

2.1 Core Functionalities

  • βœ… Compress .txt files into .bin
  • βœ… Decompress .bin files back to original text
  • βœ… Generate optimal prefix-free Huffman codes
  • βœ… Preserve data integrity (lossless compression)

2.2 Analytics & Debugging

  • πŸ“Š Compression ratio calculation
  • πŸ“Š Space saving percentage
  • πŸ“Š Character frequency analysis
  • πŸ“Š Display of shortest and longest codes

2.3 System-Level Design

  • βš™οΈ Efficient use of STL (priority_queue, map)
  • βš™οΈ Bit-level encoding logic
  • βš™οΈ Custom binary file format

3️⃣ How Huffman Coding Works

Step 1: Frequency Calculation

Count frequency of each character in the input file.

Step 2: Build Min Heap

Insert characters into a priority queue based on frequency.

Step 3: Construct Huffman Tree

  • Remove two lowest frequency nodes
  • Merge into a new node
  • Repeat until one root remains

Step 4: Generate Codes

Traverse the tree:

  • Left β†’ 0
  • Right β†’ 1

Step 5: Encode File

Replace each character with its binary code.

Step 6: Decode File

Reconstruct original text using the stored tree.


4️⃣ Project Structure

huffman-compressor-cpp/
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.cpp            # Entry point
β”‚   β”œβ”€β”€ Huffman.cpp         # Core algorithm
β”‚   β”œβ”€β”€ FileHandler.cpp     # File I/O operations
β”‚   └── utils.cpp           # Helper functions
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ input.txt           # Input file
β”‚   β”œβ”€β”€ out.bin             # Compressed output
β”‚   └── restored.txt        # Decompressed output
β”‚
β”œβ”€β”€ build/                  # Build directory (ignored)
β”œβ”€β”€ CMakeLists.txt          # Build configuration
β”œβ”€β”€ README.md
└── .gitignore

5️⃣ Build Instructions

5.1 Using g++ / clang++

clang++ -std=c++17 main.cpp Huffman.cpp FileHandler.cpp utils.cpp -o huff

5.2 Using CMake (Recommended)

rm -rf build
mkdir build
cd build
cmake ..
make

6️⃣ Usage

6.1 Compress File

./huff compress data/input.txt data/out.bin

6.2 Decompress File

./huff decompress data/out.bin data/restored.txt

7️⃣ Sample Execution

[+] Reading: data/input.txt
[+] File size: 40 bytes, unique chars: 16

--- Compression Stats ---
Original size   : 40 bytes
Compressed size : 19 bytes
Compression ratio: 0.47
Space saving     : 52.50%
-------------------------

[βœ“] Compressed β†’ data/out.bin
[βœ“] Decompressed β†’ data/restored.txt

8️⃣ Performance Analysis

Metric Description
Time Complexity O(n log n)
Space Complexity O(n)
Compression Efficiency Depends on character frequency

Observations:

  • High repetition β†’ Better compression
  • Random text β†’ Lower compression efficiency

9️⃣ Technical Concepts Used

  • Data Structures:

    • Binary Trees
    • Priority Queue (Min Heap)
    • Hash Maps
  • Algorithms:

    • Greedy Algorithm (Huffman Encoding)
    • Tree Traversal (DFS)
  • System Concepts:

    • File I/O
    • Binary encoding
    • Memory management

πŸ”Ÿ Limitations

  • Works best for text files
  • Compression efficiency reduces for highly random data
  • Does not yet support large-scale file streaming

1️⃣1️⃣ Future Improvements

  • πŸ”₯ GUI interface for ease of use
  • πŸ”₯ Support for large files (GB scale)
  • πŸ”₯ Combine with LZ77 (ZIP-like compression)
  • πŸ”₯ Multi-threaded compression
  • πŸ”₯ Drag-and-drop file support

1️⃣2️⃣ Why This Project Matters

This project demonstrates:

  • Strong understanding of data structures & algorithms
  • Practical implementation of compression systems
  • Experience with file handling and system-level programming

1️⃣3️⃣ Resume Description

Huffman Compression Tool (C++) Developed a file compression system using Huffman Encoding achieving up to 58% space reduction, with custom binary encoding and decoding pipeline.


1️⃣4️⃣ Author

Manish Nalumachu


⭐ Final Note

This project is a strong demonstration of combining algorithmic thinking with real-world system implementation, making it highly valuable for software engineering interviews.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors