Interactive Virtual Laboratory

Shannon-Fano Entropy Coding

Explore the fundamental technique of lossless data compression developed by Claude Shannon and Robert Fano. Construct prefix codes, visualize binary trees, and analyze coding efficiency through interactive simulations.

Theoretical Background

Understanding the fundamentals of Shannon-Fano coding, entropy, and prefix codes.

What is Shannon-Fano Coding?

Shannon-Fano coding, named after Claude Elwood Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is one of the earliest entropy encoding techniques for lossless data compression.

The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are of equal probability, the one bit of information used to distinguish them is used most efficiently.

Key Properties

  • 1 Prefix-free: No codeword is a prefix of another codeword, ensuring unique decodability.
  • 2 Variable-length: More frequent symbols get shorter codes, less frequent symbols get longer codes.
  • 3 Suboptimal: Unlike Huffman coding, Shannon-Fano does not always produce optimal prefix codes.
  • 4 Bound: Average code length L satisfies: H(X) ≤ L ≤ H(X) + 1, where H(X) is entropy.

Shannon-Fano Algorithm Steps

1

List Probabilities

2

Sort Symbols

3

Split List

4

Assign Bits

5

Repeat

Step 1:

Develop Probability List

For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol's relative frequency of occurrence is known.

Step 2:

Sort by Frequency

Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right (descending order).

Step 3:

Divide into Two Parts

Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible (minimize the difference).

Step 4:

Assign Binary Digits

The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1.

Step 5:

Recursive Application

Recursively apply steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

Information Entropy

Entropy H(X) measures the average amount of information contained in each symbol from a source. It represents the theoretical minimum average code length.

H(X) = -Σ p(x) × log₂(p(x))

Where p(x) is the probability of symbol x. The Shannon-Fano code guarantees that the average code length L satisfies: H(X) ≤ L ≤ H(X) + 1

Example Calculation
Symbol A: p=0.5 -0.5×log₂(0.5) = 0.5
Symbol B: p=0.25 -0.25×log₂(0.25) = 0.5
Symbol C: p=0.25 -0.25×log₂(0.25) = 0.5
Entropy H(X) 1.5 bits

Interactive Simulation

Enter symbols with their frequencies or probabilities to generate Shannon-Fano codes and visualize the coding tree.

Input Symbols

Statistics

Entropy H(X) -
Avg. Code Length -
Efficiency -
Redundancy -

Generated Codes

0 symbols
Symbol Probability Code Length Contribution

Enter symbols and click "Generate Codes" to see results

Binary Tree Visualization

Tree will appear here after code generation

Laboratory Procedure

Step-by-step guide for conducting the Shannon-Fano coding experiment.

1 Aim of the Experiment

  • To understand the concept of entropy and variable-length coding
  • To implement the Shannon-Fano algorithm for constructing prefix codes
  • To analyze the efficiency of Shannon-Fano coding compared to the entropy limit
  • To visualize the binary tree structure of prefix codes

2 Equipment Required

Computer with web browser
Calculator (or use built-in simulation)
Notebook for recording observations
Reference material on information theory

3 Step-by-Step Procedure

1
Problem Setup

Select a set of symbols (minimum 4, maximum 8) with their corresponding probabilities or frequencies. Ensure that probabilities sum to 1.0 if using probability mode.

Example: A(0.4), B(0.3), C(0.2), D(0.1)
2
Sort Symbols

Arrange the symbols in decreasing order of probability (or frequency). The most probable symbol should be at the top/left.

3
First Division

Divide the sorted list into two groups such that the sum of probabilities in each group is as close as possible. Assign '0' to the left group and '1' to the right group.

Note: If exact equality is not possible, minimize the absolute difference between group sums.
4
Recursive Division

For each group containing more than one symbol, repeat the division process. Append '0' for left subgroups and '1' for right subgroups to the existing codes.

5
Termination

Continue until each group contains exactly one symbol. The accumulated bits form the final code for that symbol.

6
Analysis

Calculate the average code length, entropy, and coding efficiency using the formulas provided in the Analysis section.

4 Precautions

  • Ensure that the sum of all probabilities equals exactly 1.0
  • When splitting groups, always choose the division that minimizes the difference in total probability between left and right groups
  • Verify that no code word is a prefix of another code word (prefix-free property)
  • Double-check calculations for entropy and average code length
  • Record all intermediate steps for verification

Analysis & Calculations

Formulas and methods for evaluating the performance of Shannon-Fano coding.

Key Formulas

Entropy (H)

H(X) = -Σ p(x) × log₂(p(x))

Measures average information content in bits per symbol

Average Code Length (L)

L = Σ p(x) × l(x)

Where l(x) is the length of code for symbol x

Efficiency (η)

η = H(X) / L × 100%

Ratio of entropy to average code length

Redundancy (R)

R = L - H(X)

Difference between actual and minimum possible length

Worked Example

Given symbols with probabilities:

A:0.4
B:0.3
C:0.2
D:0.1

Step 1: Sort

A(0.4), B(0.3), C(0.2), D(0.1)

Step 2: First Split

Group 1: A(0.4) + B(0.3) = 0.7 → 0

Group 2: C(0.2) + D(0.1) = 0.3 → 1

Step 3: Second Split

A(0.4) → 00, B(0.3) → 01

C(0.2) → 10, D(0.1) → 11

Results
Entropy H(X): 1.846 bits
Average Length L: 2.0 bits
Efficiency: 92.3%
Redundancy: 0.154 bits

Shannon-Fano vs Huffman Coding

Aspect Shannon-Fano Huffman
Optimality Not always optimal Always optimal (minimum redundancy)
Approach Top-down (divide and conquer) Bottom-up (merge lowest probabilities)
Complexity O(n log n) - sorting dominated O(n log n) - priority queue
Code Length Bound H ≤ L ≤ H + 1 H ≤ L ≤ H + 1 (tighter in practice)
Usage Educational, historical (ZIP implode) Widely used in compression (JPEG, MP3, etc.)