Virtual Laboratory: Shannon-Fano Coding

Theoretical Background

Understanding the fundamentals of Shannon-Fano coding, entropy, and prefix codes.

What is Shannon-Fano Coding?

Shannon-Fano coding, named after Claude Elwood Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is one of the earliest entropy encoding techniques for lossless data compression.

The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are of equal probability, the one bit of information used to distinguish them is used most efficiently.

Key Properties

1 Prefix-free: No codeword is a prefix of another codeword, ensuring unique decodability.
2 Variable-length: More frequent symbols get shorter codes, less frequent symbols get longer codes.
3 Suboptimal: Unlike Huffman coding, Shannon-Fano does not always produce optimal prefix codes.
4 Bound: Average code length L satisfies: H(X) ≤ L ≤ H(X) + 1, where H(X) is entropy.

Shannon-Fano Algorithm Steps

1

List Probabilities

2

Sort Symbols

3

Split List

4

Assign Bits

5

Repeat

Step 1:

Develop Probability List

For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol's relative frequency of occurrence is known.

Step 2:

Sort by Frequency

Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right (descending order).

Step 3:

Divide into Two Parts

Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible (minimize the difference).

Step 4:

Assign Binary Digits

The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1.

Step 5:

Recursive Application

Recursively apply steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

Information Entropy

Entropy H(X) measures the average amount of information contained in each symbol from a source. It represents the theoretical minimum average code length.

H(X) = -Σ p(x) × log₂(p(x))

Where p(x) is the probability of symbol x. The Shannon-Fano code guarantees that the average code length L satisfies: H(X) ≤ L ≤ H(X) + 1

Example Calculation

Symbol A: p=0.5 -0.5×log₂(0.5) = 0.5

Symbol B: p=0.25 -0.25×log₂(0.25) = 0.5

Symbol C: p=0.25 -0.25×log₂(0.25) = 0.5

Entropy H(X) 1.5 bits

Interactive Simulation

Enter symbols with their frequencies or probabilities to generate Shannon-Fano codes and visualize the coding tree.

Input Symbols

Use Frequencies (counts) Use Probabilities (sum to 1)

Statistics

Entropy H(X) -

Avg. Code Length -

Efficiency -

Redundancy -

Generated Codes

0 symbols

Symbol	Probability	Code	Length	Contribution
Enter symbols and click "Generate Codes" to see results

Binary Tree Visualization

Tree will appear here after code generation

Laboratory Procedure

Step-by-step guide for conducting the Shannon-Fano coding experiment.

1 Aim of the Experiment

To understand the concept of entropy and variable-length coding
To implement the Shannon-Fano algorithm for constructing prefix codes
To analyze the efficiency of Shannon-Fano coding compared to the entropy limit
To visualize the binary tree structure of prefix codes

2 Equipment Required

Computer with web browser

Calculator (or use built-in simulation)

Notebook for recording observations

Reference material on information theory

3 Step-by-Step Procedure

1

Problem Setup

Select a set of symbols (minimum 4, maximum 8) with their corresponding probabilities or frequencies. Ensure that probabilities sum to 1.0 if using probability mode.

Example: A(0.4), B(0.3), C(0.2), D(0.1)

2

Sort Symbols

Arrange the symbols in decreasing order of probability (or frequency). The most probable symbol should be at the top/left.

3

First Division

Divide the sorted list into two groups such that the sum of probabilities in each group is as close as possible. Assign '0' to the left group and '1' to the right group.

Note: If exact equality is not possible, minimize the absolute difference between group sums.

4

Recursive Division

For each group containing more than one symbol, repeat the division process. Append '0' for left subgroups and '1' for right subgroups to the existing codes.

5

Termination

Continue until each group contains exactly one symbol. The accumulated bits form the final code for that symbol.

6

Analysis

Calculate the average code length, entropy, and coding efficiency using the formulas provided in the Analysis section.

4 Precautions

Ensure that the sum of all probabilities equals exactly 1.0
When splitting groups, always choose the division that minimizes the difference in total probability between left and right groups
Verify that no code word is a prefix of another code word (prefix-free property)
Double-check calculations for entropy and average code length
Record all intermediate steps for verification

Analysis & Calculations

Formulas and methods for evaluating the performance of Shannon-Fano coding.

Key Formulas

Entropy (H)

H(X) = -Σ p(x) × log₂(p(x))

Measures average information content in bits per symbol

Average Code Length (L)

L = Σ p(x) × l(x)

Where l(x) is the length of code for symbol x

Efficiency (η)

η = H(X) / L × 100%

Ratio of entropy to average code length

Redundancy (R)

R = L - H(X)

Difference between actual and minimum possible length

Worked Example

Given symbols with probabilities:

A:0.4

B:0.3

C:0.2

D:0.1

Step 1: Sort

A(0.4), B(0.3), C(0.2), D(0.1)

Step 2: First Split

Group 1: A(0.4) + B(0.3) = 0.7 → 0

Group 2: C(0.2) + D(0.1) = 0.3 → 1

Step 3: Second Split

A(0.4) → 00, B(0.3) → 01

C(0.2) → 10, D(0.1) → 11

Results

Entropy H(X): 1.846 bits

Average Length L: 2.0 bits

Efficiency: 92.3%

Redundancy: 0.154 bits

Shannon-Fano vs Huffman Coding

Aspect	Shannon-Fano	Huffman
Optimality	Not always optimal	Always optimal (minimum redundancy)
Approach	Top-down (divide and conquer)	Bottom-up (merge lowest probabilities)
Complexity	O(n log n) - sorting dominated	O(n log n) - priority queue
Code Length Bound	H ≤ L ≤ H + 1	H ≤ L ≤ H + 1 (tighter in practice)
Usage	Educational, historical (ZIP implode)	Widely used in compression (JPEG, MP3, etc.)

Shannon-Fano Entropy Coding

Theoretical Background

What is Shannon-Fano Coding?

Key Properties

Shannon-Fano Algorithm Steps

Information Entropy

Example Calculation

Interactive Simulation

Input Symbols

Statistics

Generated Codes

Binary Tree Visualization

Laboratory Procedure

1 Aim of the Experiment

2 Equipment Required

3 Step-by-Step Procedure

Problem Setup

Sort Symbols

First Division

Recursive Division

Termination

Analysis

4 Precautions

Analysis & Calculations

Key Formulas

Worked Example

Results

Shannon-Fano vs Huffman Coding