Canonical Huffman Compressor

Compress Files!!!

📝 Problem Description

In modern computing, efficient data storage and transmission are critical. Standard text files (ASCII/UTF-8) use a fixed width of 8 bits per character, which is wasteful for characters that appear frequently (e.g., 'e', 'a', ' ').

The Solution: This project implements Canonical Huffman Coding, a lossless data compression algorithm widely used in industrial standards like ZIP, GZIP, and JPEG.

Efficiency: We assign shorter binary codes to frequent characters and longer codes to rare ones.
Real-World Optimization: Unlike basic implementations that store the entire tree structure, our compressor uses Canonical state (storing only bit lengths) and Bit Packing (writing actual binary bits, not ASCII '0's and '1's). This mimics professional compression tools, achieving significantly smaller file sizes.

🚀 Instructions to Run

Prerequisites

GHC (Glasgow Haskell Compiler) (Version 8.0 or higher)
Cabal (Haskell build tool) - Usually installed with GHC

1. Build with Cabal (Recommended)

The project uses Cabal for dependency management and includes parallel processing support.

# Build the project (downloads dependencies automatically)
cabal build

# Or use cabal install to make it available globally
cabal install

Benefits of using Cabal:

Automatic dependency management (parallel, bytestring, containers)
Built-in parallel processing support (uses all CPU cores)
Optimized compilation with -O2 flag
Reproducible builds

2. Usage

Using Cabal (Recommended):

# Compress a file (parallel processing enabled automatically)
cabal run compressor -- -c <input_file> <output_file>

# Decompress a file
cabal run compressor -- -d <input_file> <output_file>

# Show help
cabal run compressor -- -h

Alternative: Direct Compilation with GHC

If you prefer not to use Cabal:

ghc -O2 --make Main.hs -o compressor -package containers -package bytestring -package parallel -threaded -rtsopts
./compressor -c data.txt data.huff

3. Examples

# Compress 'data.txt' into 'data.huff'
cabal run compressor -- -c data.txt data.huff

# Decompress 'data.huff' back to 'restored.txt'
cabal run compressor -- -d data.huff restored.txt

# Verify the files match
diff data.txt restored.txt

Flags:

-c : Compress a file
-d : Decompress a file
-h : Show help message

💻 Sample Input/Output

Compression Workflow

Command:

cabal run compressor -- -c example.txt compressed.huff

Output:

Reading example.txt...
Analyzing frequencies (parallel)...
Building Canonical Huffman Tree...
Encoding and Bit-Packing...
Compressed to compressed.huff
Compression Complete.

Decompression Workflow

Command:

cabal run compressor -- -d compressed.huff restored.txt

Output:

Reading compressed.huff...
Reconstructing Tree from Canonical Lengths...
Decoding...
Writing to restored.txt...
Decompression Complete.

Performance Metrics

$ time cabal run compressor -- -c large_test.txt output.huff

real    0m1.047s   # Wall-clock time
user    0m1.107s   # CPU time (parallel processing)
sys     0m0.869s   # System I/O time

Analysis: user + sys > real proves parallel processing is working! Multiple CPU cores are being utilized simultaneously.

🧠 Functional Programming Concepts Used

This project strictly adheres to Functional Programming principles to ensure reliability and testability.

1. Pure Functions & Immutability

The core logic resides in Processing.hs and uses no mutable state. Functions take input and produce output deterministically.

Example: canonicalCodes transforms a length table into code assignments without modifying external memory.
```
-- Processing.hs
canonicalCodes :: BitLenTable -> CodeTable
```

2. Recursion

We use recursion instead of loops for tree traversal and bit stream decoding. This allows us to handle potentially infinite data streams or complex tree structures safely.

Example: The decode function uses a recursive state machine to traverse the Huffman Tree.

-- Processing.hs
go (b:bs) node bitIdx
    | bitIdx < 0 = go bs node 7 -- Recursive call to next byte

3. Algebraic Data Types (ADTs)

We define a custom data structure to model the Huffman Tree, ensuring type safety and preventing invalid tree states.

Example:

-- DataTypes.hs
data HuffmanTree = Leaf Char Int
                 | Node Int HuffmanTree HuffmanTree

4. Higher-Order Functions

We utilize Haskell's powerful list processing functions (map, sortBy, foldl) to manipulate data concisely.

Example: Sorting characters by frequency and length:

-- Processing.hs
let sorted = sortBy (\(c1, l1) (c2, l2) -> ...) lenTable

5. Concurrency & Parallelism

The implementation uses Haskell's parallel library for concurrent frequency counting, automatically distributing work across multiple CPU cores.

Example: Parallel frequency analysis in Utils.hs:

frequencyCountParallel :: String -> [(Char, Int)]
frequencyCountParallel str = 
    let chunks = chunksOf chunkSize str
        localCounts = parMap rdeepseq frequencyCount chunks
    in mergeFrequencies localCounts

Benefits:
- Achieves 2-4x speedup on large files
- No manual thread management required
- Race-condition free (pure functions guarantee safety)

6. Modular Design

The code is separated into single-responsibility modules:

Main.hs: CLI Argument Parsing and IO Orchestration.
Processing.hs: Pure compression/decompression logic.
IOHandler.hs: Binary file reading/writing.
DataTypes.hs: Type definitions.
Utils.hs: General helper functions and parallel processing.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
DataTypes.hs		DataTypes.hs
IOHandler.hs		IOHandler.hs
Main.hs		Main.hs
Processing.hs		Processing.hs
README.md		README.md
Utils.hs		Utils.hs
huffman-compressor.cabal		huffman-compressor.cabal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Canonical Huffman Compressor

📝 Problem Description

🚀 Instructions to Run

Prerequisites

1. Build with Cabal (Recommended)

2. Usage

3. Examples

💻 Sample Input/Output

Compression Workflow

Decompression Workflow

Performance Metrics

🧠 Functional Programming Concepts Used

1. Pure Functions & Immutability

2. Recursion

3. Algebraic Data Types (ADTs)

4. Higher-Order Functions

5. Concurrency & Parallelism

6. Modular Design

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Malisha4065/HuffmanCodingHaskell

Folders and files

Latest commit

History

Repository files navigation

Canonical Huffman Compressor

📝 Problem Description

🚀 Instructions to Run

Prerequisites

1. Build with Cabal (Recommended)

2. Usage

3. Examples

💻 Sample Input/Output

Compression Workflow

Decompression Workflow

Performance Metrics

🧠 Functional Programming Concepts Used

1. Pure Functions & Immutability

2. Recursion

3. Algebraic Data Types (ADTs)

4. Higher-Order Functions

5. Concurrency & Parallelism

6. Modular Design

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages