Huffman Coding Study Guide - Communication Engineering

03 Huffman Algorithm

Algorithm Steps

Frequency Analysis: Calculate occurrence probability (or frequency) for each symbol in the source data.
Initialize Priority Queue: Create a leaf node for each symbol and add it to a min-heap/priority queue ordered by frequency.
Tree Construction: While more than one node in the queue:
- Remove the two nodes with lowest frequencies
- Create new internal node with frequency = sum of two nodes
- Make the two nodes left and right children (0=left, 1=right)
- Insert new node into queue
Code Assignment: The remaining node is the root. Traverse from root to leaves, appending 0 for left branches and 1 for right branches.
Encoding: Replace each source symbol with its corresponding Huffman code.

Pseudocode

function HuffmanCoding(S):
    // S: set of symbols with frequencies
    n ← |S|
    Q ← priority queue ordered by frequency
    
    for each symbol s in S:
        insert Node(s.freq) into Q
    
    for i ← 1 to n-1:
        z ← new Node()
        z.left ← x ← Extract-Min(Q)
        z.right ← y ← Extract-Min(Q)
        z.freq ← x.freq + y.freq
        Insert(Q, z)
    
    return Extract-Min(Q)  // Root of Huffman tree

function AssignCodes(node, code):
    if node is leaf:
        node.code ← code
    else:
        AssignCodes(node.left, code + "0")
        AssignCodes(node.right, code + "1")

Complexity Analysis

Operation	Complexity	Explanation
Building Priority Queue	O(n)	Using heapify operation
Tree Construction (n-1 iterations)	O(n log n)	Each extract-min and insert is O(log n)
Code Assignment	O(n)	Single traversal of tree
Total Time	O(n log n)	Dominated by heap operations
Space Complexity	O(n)	Storage for tree and codes

04 Worked Example

Problem Statement

Given a source with symbols {A, B, C, D, E, F} and probabilities:

Symbol	A	B	C	D	E	F
Probability	0.45	0.13	0.12	0.16	0.09	0.05

Step-by-Step Construction

Step 1: Initial Priority Queue

Queue: [F(0.05), E(0.09), C(0.12), B(0.13), D(0.16), A(0.45)]
Sorted by increasing frequency

Step 2: Iteration 1 - Combine F and E

Remove: F(0.05), E(0.09)
Create: Node(0.14) with children F(0), E(1)
New Queue: [C(0.12), B(0.13), Node(0.14), D(0.16), A(0.45)]

Step 3: Iteration 2 - Combine C and B

Remove: C(0.12), B(0.13)
Create: Node(0.25) with children C(0), B(1)
New Queue: [Node(0.14), D(0.16), Node(0.25), A(0.45)]

Step 4: Iteration 3 - Combine Node(0.14) and D

Remove: Node(0.14), D(0.16)
Create: Node(0.30) with children Node(0.14)(0), D(1)
New Queue: [Node(0.25), Node(0.30), A(0.45)]

Step 5: Iteration 4 - Combine Node(0.25) and Node(0.30)

Remove: Node(0.25), Node(0.30)
Create: Node(0.55) with children Node(0.25)(0), Node(0.30)(1)
New Queue: [A(0.45), Node(0.55)]

Step 6: Final Combination

Remove: A(0.45), Node(0.55)
Create: Root(1.00) with children A(0), Node(0.55)(1)
Final Tree Complete!

Final Huffman Tree Visualization

Root

1.00

0.45

Node

0.55

Node

0.25

Node

0.30

0.12

0.13

Node

0.14

0.16

0.05

0.09

                    Click on any node to see its code
                

Final Code Table

Symbol	Probability	Huffman Code	Length	Contribution (p × l)
A	0.45	`0`	1	0.45
B	0.13	`101`	3	0.39
C	0.12	`100`	3	0.36
D	0.16	`111`	3	0.48
E	0.09	`1101`	4	0.36
F	0.05	`1100`	4	0.20
Average Code Length (L)				2.24 bits/symbol
Entropy H(X)				2.19 bits/symbol
Efficiency (H/L)				97.8%

Analysis Results

Fixed-length code would require 3 bits/symbol (⌈log₂6⌉ = 3)
Huffman code achieves 2.24 bits/symbol
Compression ratio: 3/2.24 = 1.34 (25.3% reduction)
Redundancy: L - H = 0.05 bits/symbol (very close to theoretical limit)

Encoding Example

Encoding the string "BAD":

B → 101
A → 0
D → 111

BAD → 101 0 111 = 1010111 (7 bits)

Fixed-length (3 bits each): 011 000 011 = 011000011 (9 bits)
Savings: 22.2%

05 Applications in Communication

🗜️

File Compression

GZIP, ZIP, Brotli (combined with LZ77)

🖼️

Image Coding

JPEG (DC coefficients), PNG (DEFLATE)

🎵

Audio Compression

MP3 (Huffman coding of quantized values)

📹

Video Coding

H.264/AVC, H.265/HEVC, AV1

📡

Wireless Comms

Efficient data transmission protocols

🔒

Cryptography

Randomness extraction, entropy coding

Variants and Extensions

Adaptive Huffman Coding

Updates tree dynamically as data arrives. Used when symbol statistics are unknown beforehand (e.g., Vitter's algorithm).

n-ary Huffman Coding

Extension to D-ary trees (D > 2). Useful when channel supports multiple symbols or for byte-aligned coding.

Length-Limited Huffman

Constrains maximum codeword length. Important for hardware implementations with limited buffer sizes.

Canonical Huffman

Codes are lexicographically ordered by length. Enables faster decoding with lookup tables.

Comparison with Other Codes

Method	Type	Optimality	Speed	Use Case
Huffman	Variable-length	Optimal for integer lengths	Fast	General compression
Arithmetic	Fractional	Near entropy limit	Slower	High compression
Shannon-Fano	Variable-length	Suboptimal	Fast	Historical/educational
LZ77/LZ78	Dictionary	Universal	Very fast	Repetitive data
Run-Length	Simple	Context-dependent	Fastest	Bitmaps, fax

06 Interactive Huffman Calculator

Enter Symbols and Frequencies (format: A:45 B:13 C:12...)

Results:

Practice Problems

Problem 1: Basic Encoding

Given symbols {X, Y, Z} with probabilities {0.6, 0.3, 0.1}, construct the Huffman code and calculate the average code length.

Step 1: Combine Z(0.1) and Y(0.3) → Node(0.4)
Step 2: Combine Node(0.4) and X(0.6) → Root(1.0)

Codes:
X: 1 (length 1)
Y: 01 (length 2)
Z: 00 (length 2)

Average Length = 0.6×1 + 0.3×2 + 0.1×2 = 1.4 bits/symbol
Entropy = 1.295 bits/symbol
Efficiency = 92.5%

Problem 2: Efficiency Calculation

For a source with 8 equiprobable symbols, what is the Huffman coding efficiency compared to fixed-length coding?

For 8 equiprobable symbols: p = 1/8 = 0.125 each

Huffman will create a balanced tree:
All codes have length 3 (since 2^3 = 8)

Fixed-length: 3 bits/symbol (⌈log₂8⌉ = 3)
Huffman: 3 bits/symbol

Efficiency = 100% (no improvement for uniform distribution)

Entropy H(X) = -8 × (1/8) × log₂(1/8) = 3 bits/symbol
This demonstrates Huffman coding works best with skewed distributions.

Problem 3: Communication System Design

A sensor transmits 4 types of packets: Data (70%), Control (20%), Ack (8%), and Error (2%). Design a Huffman code for this protocol and calculate the bandwidth savings if the channel capacity is 1000 packets/second.

Probabilities: D:0.70, C:0.20, A:0.08, E:0.02

Construction:
1. Combine E(0.02) + A(0.08) = Node(0.10)
2. Combine Node(0.10) + C(0.20) = Node(0.30)
3. Combine Node(0.30) + D(0.70) = Root(1.00)

Codes:
D: 1 (1 bit)
C: 01 (2 bits)
A: 001 (3 bits)
E: 000 (3 bits)

Average length = 0.7×1 + 0.2×2 + 0.08×3 + 0.02×3 = 1.4 bits/packet
Fixed-length would require 2 bits/packet (⌈log₂4⌉ = 2)

Bandwidth savings: (2 - 1.4)/2 × 100% = 30%
Effective capacity: 1000/1.4 = ~714 packets/sec vs 500 packets/sec fixed

07 Summary & Key Takeaways

Optimality: Huffman coding produces the most efficient prefix code for given symbol probabilities
Greedy Strategy: Always combine least probable symbols first - local optimal choices lead to global optimum
Prefix Property: No code is prefix of another, ensuring unambiguous instantaneous decoding
Performance: Average length L satisfies H(X) ≤ L < H(X) + 1, approaching entropy limit
Limitations: Requires probability knowledge; adaptive versions exist for unknown statistics
Practical Use: Often combined with other techniques (LZ77, run-length) in real compression systems

Essential Formulas

Average Code Length: L = Σ pᵢ × lᵢ

Efficiency: η = H(X)/L × 100%

Redundancy: R = L - H(X) bits/symbol

Variance: σ² = Σ pᵢ(lᵢ - L)²

Huffman Coding

01 Introduction

Optimal Prefix Code

Binary Tree Structure

Greedy Approach

Key Characteristics

02 Theoretical Foundation

Information Theory Basics

Kraft-McMillan Inequality

Optimality Proof

Why Huffman is Optimal

03 Huffman Algorithm

Algorithm Steps

Pseudocode

Complexity Analysis

04 Worked Example

Problem Statement

Step-by-Step Construction

Step 1: Initial Priority Queue

Step 2: Iteration 1 - Combine F and E

Step 3: Iteration 2 - Combine C and B

Step 4: Iteration 3 - Combine Node(0.14) and D

Step 5: Iteration 4 - Combine Node(0.25) and Node(0.30)

Step 6: Final Combination

Final Huffman Tree Visualization

Final Code Table

Analysis Results

Encoding Example

05 Applications in Communication

File Compression

Image Coding

Audio Compression

Video Coding

Wireless Comms

Cryptography

Variants and Extensions

Adaptive Huffman Coding

n-ary Huffman Coding

Length-Limited Huffman

Canonical Huffman

Comparison with Other Codes

06 Interactive Huffman Calculator

Results:

Practice Problems

Problem 1: Basic Encoding

Problem 2: Efficiency Calculation

Problem 3: Communication System Design

07 Summary & Key Takeaways

Essential Formulas