Laboratory Objectives

Understanding lossless data compression through optimal prefix coding

Primary Objectives

• Understand the concept of entropy and information content in data sources
• Learn the algorithm for constructing optimal prefix codes
• Analyze the properties of Huffman codes (prefix-free, optimal)
• Calculate coding efficiency and redundancy

Learning Outcomes

• Construct Huffman trees for given symbol probability distributions
• Encode and decode messages using Huffman codes
• Compare Huffman coding with fixed-length coding schemes
• Apply Huffman coding in practical data compression scenarios

Pre-requisites

Probability Theory

Basic understanding of probability distributions and random variables

Binary Trees

Knowledge of tree data structures and traversal algorithms

Information Theory

Familiarity with entropy and information content concepts

Theoretical Background

Fundamental principles of optimal prefix coding

1. Introduction to Huffman Coding

Huffman coding, developed by David A. Huffman in 1952, is a lossless data compression algorithm that assigns variable-length codes to input symbols based on their frequencies. More frequent symbols receive shorter codes, while less frequent symbols receive longer codes.

"Huffman coding produces optimal prefix codes that minimize the expected code length for a given probability distribution of symbols."

2. Information Entropy

The entropy H(X) of a discrete random variable X measures the average amount of information produced by the source:

H(X) = -Σ p(x_i) log₂ p(x_i) bits/symbol

where p(x_i) is the probability of symbol x_i. Entropy represents the theoretical minimum average code length achievable.

3. Huffman Algorithm

1

Sort Symbols

Arrange all symbols in decreasing order of probability

2

Combine Least Probable

Combine the two symbols with lowest probabilities into a new node with probability equal to their sum

3

Repeat

Repeat step 2 until only one node remains (the root)

4

Assign Codes

Traverse the tree from root to leaves, assigning '0' to left branches and '1' to right branches

4. Key Properties

Prefix-Free (Instantaneous)

No code word is a prefix of another code word, enabling unambiguous decoding without delimiters

Optimality

Produces the minimum expected code length among all prefix codes for a given distribution

Kraft Inequality

Satisfies Σ 2^-l_i ≤ 1, where l_i is the length of code i

Efficiency

η = H(X)/L̄ where L̄ is average code length. Huffman coding achieves η close to 100%

5. Worked Example

Consider a source with symbols A, B, C, D with probabilities:

Symbol	Probability	Huffman Code	Length	Contribution
A	0.5	0	1	0.5 × 1 = 0.5
B	0.25	10	2	0.25 × 2 = 0.5
C	0.125	110	3	0.125 × 3 = 0.375
D	0.125	111	3	0.125 × 3 = 0.375

Entropy H(X): 1.75 bits/symbol

Average Length L̄: 1.75 bits/symbol

Efficiency: 100%

Redundancy: 0%

Laboratory Procedure

Step-by-step experimental guidelines

1

Manual Huffman Tree Construction

Step 1.1: Problem Setup

Given a source with symbols {A, B, C, D, E} with probabilities {0.4, 0.2, 0.2, 0.1, 0.1}, construct the Huffman tree manually.

Step 1.2: Sorting

Arrange symbols in decreasing order of probability:

A(0.4) ≥ B(0.2) = C(0.2) ≥ D(0.1) = E(0.1)

Step 1.3: Tree Construction

Combine D(0.1) and E(0.1) → Node1(0.2)
Combine B(0.2) and C(0.2) → Node2(0.4)
Combine Node1(0.2) and A(0.4) → Node3(0.6)
Combine Node2(0.4) and Node3(0.6) → Root(1.0)

Step 1.4: Code Assignment

Traverse from root, assigning 0 to left branches and 1 to right branches. Record the path to each leaf node.

2

Encoding and Decoding

Step 2.1: Encoding

Using the codes generated in Experiment 1, encode the message: "ABACAD"

A→0 | B→10 | A→0 | C→10 | A→0 | D→11

Encoded: 0100100111 (10 bits)

Fixed-length (3 bits/symbol) would require: 18 bits

Step 2.2: Decoding

Decode the bitstream: 1011000111

Start at root, read bits until reaching leaf
10 → B
10 → C
0 → A
0 → A
11 → D

Decoded: BCAAD

3

Performance Analysis

Step 3.1: Calculate Metrics

1. Calculate Entropy: H = -Σ p_i log₂ p_i

2. Calculate Average Length: L̄ = Σ p_i l_i

3. Calculate Efficiency: η = (H/L̄) × 100%

4. Calculate Redundancy: R = L̄ - H

5. Calculate Variance: σ² = Σ p_i(l_i - L̄)²

Step 3.2: Comparison

Compare Huffman coding with:

Fixed-length binary coding
Shannon-Fano coding
Arithmetic coding (theoretical comparison)

Important Notes

⚠ Ensure probabilities sum to exactly 1.0 (normalize if necessary)
⚠ Huffman codes are not unique - different valid trees may exist for the same distribution
⚠ When two nodes have equal probability, the choice of which to place left/right affects the specific codes but not the average length

Interactive Simulation

Visualize Huffman tree construction step-by-step

Input Configuration

Number of Symbols

Tree Visualization

Symbol Internal

Click "Build Huffman Tree" to visualize

Coding Results

Symbol	Probability	Code	Length	Weight

Performance Metrics

Entropy H(X): -

Avg Length L̄: -

Efficiency: -

Redundancy: -

Compression Ratio

Progress

-

Huffman Calculator

Encode and decode messages instantly

Message Encoder

Code Table (JSON format)

Format: {"symbol": "code", ...}

Message to Encode

Bitstream Decoder

Code Table (JSON format)

Format: {"code": "symbol", ...}

Bitstream to Decode

Quick Reference: Common Code Tables

Laboratory Report Guidelines

Structure and requirements for your submission

Report Structure

1

Title Page

• Experiment Title: "Huffman Coding for Data Compression"
• Student Name, ID, and Group
• Date of Experiment
• Course Name and Code

2

Objectives

State the aims of the experiment in your own words. Include both theoretical understanding and practical skills to be acquired.

3

Theory

Summarize the theoretical background including:

• Concept of entropy and information content
• Prefix codes and their properties
• Huffman algorithm description
• Optimality proof overview

4

Procedure & Observations

Document all steps performed:

• Step-by-step tree construction (include diagrams)
• Code assignment process
• Encoding/decoding examples
• Screenshots of simulation results

5

Calculations & Analysis

Present all calculations in tabular form:

• Entropy calculation with formula
• Average code length computation
• Efficiency and redundancy analysis
• Comparison with fixed-length coding
• Variance of code lengths

6

Conclusion

Summarize findings and discuss:

• Whether results match theoretical expectations
• Sources of error or sub-optimality
• Practical applications of Huffman coding
• Limitations observed

Grading Rubric

Component	Weight	Criteria	Points
Theory Understanding	20%	Correct explanation of concepts, entropy calculations	20
Procedure Execution	25%	Correct tree construction, proper code assignment	25
Calculations	25%	Accuracy of metrics, proper formula application	25
Analysis	20%	Comparison, interpretation of results, conclusions	20
Presentation	10%	Organization, clarity, diagrams, grammar	10

Submission Requirements

Format Specifications

PDF format only
Times New Roman or Arial, 12pt
1.5 line spacing
Maximum 15 pages
Include page numbers

Content Requirements

Minimum 3 different probability distributions analyzed
Hand-drawn or software-generated tree diagrams
Screenshots from virtual lab simulation
Sample calculations shown in detail

Sample Calculation Format

// Example calculation for entropy

Given: p(A)=0.5, p(B)=0.25, p(C)=0.25

H = -[0.5×log₂(0.5) + 0.25×log₂(0.25) + 0.25×log₂(0.25)]

H = -[0.5×(-1) + 0.25×(-2) + 0.25×(-2)]

H = -[-0.5 - 0.5 - 0.5]

H = 1.5 bits/symbol

// Average code length

L̄ = Σ pᵢ × lᵢ = 0.5×1 + 0.25×2 + 0.25×2

L̄ = 1.5 bits/symbol

// Efficiency

η = H/L̄ × 100% = 1.5/1.5 × 100%

η = 100%