Laboratory Objectives

Understanding lossless data compression through optimal prefix coding

Primary Objectives

  • Understand the concept of entropy and information content in data sources
  • Learn the algorithm for constructing optimal prefix codes
  • Analyze the properties of Huffman codes (prefix-free, optimal)
  • Calculate coding efficiency and redundancy

Learning Outcomes

  • Construct Huffman trees for given symbol probability distributions
  • Encode and decode messages using Huffman codes
  • Compare Huffman coding with fixed-length coding schemes
  • Apply Huffman coding in practical data compression scenarios

Pre-requisites

Probability Theory

Basic understanding of probability distributions and random variables

Binary Trees

Knowledge of tree data structures and traversal algorithms

Information Theory

Familiarity with entropy and information content concepts

Theoretical Background

Fundamental principles of optimal prefix coding

1. Introduction to Huffman Coding

Huffman coding, developed by David A. Huffman in 1952, is a lossless data compression algorithm that assigns variable-length codes to input symbols based on their frequencies. More frequent symbols receive shorter codes, while less frequent symbols receive longer codes.

"Huffman coding produces optimal prefix codes that minimize the expected code length for a given probability distribution of symbols."

2. Information Entropy

The entropy H(X) of a discrete random variable X measures the average amount of information produced by the source:

H(X) = -Σ p(xi) log2 p(xi) bits/symbol

where p(xi) is the probability of symbol xi. Entropy represents the theoretical minimum average code length achievable.

3. Huffman Algorithm

1

Sort Symbols

Arrange all symbols in decreasing order of probability

2

Combine Least Probable

Combine the two symbols with lowest probabilities into a new node with probability equal to their sum

3

Repeat

Repeat step 2 until only one node remains (the root)

4

Assign Codes

Traverse the tree from root to leaves, assigning '0' to left branches and '1' to right branches

4. Key Properties

Prefix-Free (Instantaneous)

No code word is a prefix of another code word, enabling unambiguous decoding without delimiters

Optimality

Produces the minimum expected code length among all prefix codes for a given distribution

Kraft Inequality

Satisfies Σ 2-li ≤ 1, where li is the length of code i

Efficiency

η = H(X)/L̄ where L̄ is average code length. Huffman coding achieves η close to 100%

5. Worked Example

Consider a source with symbols A, B, C, D with probabilities:

Symbol Probability Huffman Code Length Contribution
A 0.5 0 1 0.5 × 1 = 0.5
B 0.25 10 2 0.25 × 2 = 0.5
C 0.125 110 3 0.125 × 3 = 0.375
D 0.125 111 3 0.125 × 3 = 0.375

Entropy H(X): 1.75 bits/symbol

Average Length L̄: 1.75 bits/symbol

Efficiency: 100%

Redundancy: 0%

Laboratory Procedure

Step-by-step experimental guidelines

1

Manual Huffman Tree Construction

Step 1.1: Problem Setup

Given a source with symbols {A, B, C, D, E} with probabilities {0.4, 0.2, 0.2, 0.1, 0.1}, construct the Huffman tree manually.

Step 1.2: Sorting

Arrange symbols in decreasing order of probability:

A(0.4) ≥ B(0.2) = C(0.2) ≥ D(0.1) = E(0.1)

Step 1.3: Tree Construction

  1. Combine D(0.1) and E(0.1) → Node1(0.2)
  2. Combine B(0.2) and C(0.2) → Node2(0.4)
  3. Combine Node1(0.2) and A(0.4) → Node3(0.6)
  4. Combine Node2(0.4) and Node3(0.6) → Root(1.0)

Step 1.4: Code Assignment

Traverse from root, assigning 0 to left branches and 1 to right branches. Record the path to each leaf node.

2

Encoding and Decoding

Step 2.1: Encoding

Using the codes generated in Experiment 1, encode the message: "ABACAD"

A0 | B10 | A0 | C10 | A0 | D11

Encoded: 0100100111 (10 bits)

Fixed-length (3 bits/symbol) would require: 18 bits

Step 2.2: Decoding

Decode the bitstream: 1011000111

  1. Start at root, read bits until reaching leaf
  2. 10 → B
  3. 10 → C
  4. 0 → A
  5. 0 → A
  6. 11 → D

Decoded: BCAAD

3

Performance Analysis

Step 3.1: Calculate Metrics

1. Calculate Entropy: H = -Σ pi log2 pi

2. Calculate Average Length: L̄ = Σ pi li

3. Calculate Efficiency: η = (H/L̄) × 100%

4. Calculate Redundancy: R = L̄ - H

5. Calculate Variance: σ² = Σ pi(li - L̄)²

Step 3.2: Comparison

Compare Huffman coding with:

  • Fixed-length binary coding
  • Shannon-Fano coding
  • Arithmetic coding (theoretical comparison)

Important Notes

  • Ensure probabilities sum to exactly 1.0 (normalize if necessary)
  • Huffman codes are not unique - different valid trees may exist for the same distribution
  • When two nodes have equal probability, the choice of which to place left/right affects the specific codes but not the average length

Interactive Simulation

Visualize Huffman tree construction step-by-step

Input Configuration

Tree Visualization

Symbol Internal

Click "Build Huffman Tree" to visualize

Huffman Calculator

Encode and decode messages instantly

Message Encoder

Format: {"symbol": "code", ...}

Bitstream Decoder

Format: {"code": "symbol", ...}

Quick Reference: Common Code Tables

Laboratory Report Guidelines

Structure and requirements for your submission

Report Structure

1

Title Page

  • • Experiment Title: "Huffman Coding for Data Compression"
  • • Student Name, ID, and Group
  • • Date of Experiment
  • • Course Name and Code
2

Objectives

State the aims of the experiment in your own words. Include both theoretical understanding and practical skills to be acquired.

3

Theory

Summarize the theoretical background including:

  • • Concept of entropy and information content
  • • Prefix codes and their properties
  • • Huffman algorithm description
  • • Optimality proof overview
4

Procedure & Observations

Document all steps performed:

  • • Step-by-step tree construction (include diagrams)
  • • Code assignment process
  • • Encoding/decoding examples
  • • Screenshots of simulation results
5

Calculations & Analysis

Present all calculations in tabular form:

  • • Entropy calculation with formula
  • • Average code length computation
  • • Efficiency and redundancy analysis
  • • Comparison with fixed-length coding
  • • Variance of code lengths
6

Conclusion

Summarize findings and discuss:

  • • Whether results match theoretical expectations
  • • Sources of error or sub-optimality
  • • Practical applications of Huffman coding
  • • Limitations observed

Grading Rubric

Component Weight Criteria Points
Theory Understanding 20% Correct explanation of concepts, entropy calculations 20
Procedure Execution 25% Correct tree construction, proper code assignment 25
Calculations 25% Accuracy of metrics, proper formula application 25
Analysis 20% Comparison, interpretation of results, conclusions 20
Presentation 10% Organization, clarity, diagrams, grammar 10

Submission Requirements

Format Specifications

  • PDF format only
  • Times New Roman or Arial, 12pt
  • 1.5 line spacing
  • Maximum 15 pages
  • Include page numbers

Content Requirements

  • Minimum 3 different probability distributions analyzed
  • Hand-drawn or software-generated tree diagrams
  • Screenshots from virtual lab simulation
  • Sample calculations shown in detail

Sample Calculation Format

// Example calculation for entropy

Given: p(A)=0.5, p(B)=0.25, p(C)=0.25

H = -[0.5×log₂(0.5) + 0.25×log₂(0.25) + 0.25×log₂(0.25)]

H = -[0.5×(-1) + 0.25×(-2) + 0.25×(-2)]

H = -[-0.5 - 0.5 - 0.5]

H = 1.5 bits/symbol

// Average code length

L̄ = Σ pᵢ × lᵢ = 0.5×1 + 0.25×2 + 0.25×2

L̄ = 1.5 bits/symbol

// Efficiency

η = H/L̄ × 100% = 1.5/1.5 × 100%

η = 100%