Explore the fundamental technique of lossless data compression developed by Claude Shannon and Robert Fano. Construct prefix codes, visualize binary trees, and analyze coding efficiency through interactive simulations.
Understanding the fundamentals of Shannon-Fano coding, entropy, and prefix codes.
Shannon-Fano coding, named after Claude Elwood Shannon and Robert Fano, is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is one of the earliest entropy encoding techniques for lossless data compression.
The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are of equal probability, the one bit of information used to distinguish them is used most efficiently.
List Probabilities
Sort Symbols
Split List
Assign Bits
Repeat
Develop Probability List
For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol's relative frequency of occurrence is known.
Sort by Frequency
Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right (descending order).
Divide into Two Parts
Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible (minimize the difference).
Assign Binary Digits
The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1.
Recursive Application
Recursively apply steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.
Entropy H(X) measures the average amount of information contained in each symbol from a source. It represents the theoretical minimum average code length.
Where p(x) is the probability of symbol x. The Shannon-Fano code guarantees that the average code length L satisfies: H(X) ≤ L ≤ H(X) + 1
Enter symbols with their frequencies or probabilities to generate Shannon-Fano codes and visualize the coding tree.
| Symbol | Probability | Code | Length | Contribution |
|---|---|---|---|---|
|
Enter symbols and click "Generate Codes" to see results |
||||
Tree will appear here after code generation
Step-by-step guide for conducting the Shannon-Fano coding experiment.
Select a set of symbols (minimum 4, maximum 8) with their corresponding probabilities or frequencies. Ensure that probabilities sum to 1.0 if using probability mode.
Arrange the symbols in decreasing order of probability (or frequency). The most probable symbol should be at the top/left.
Divide the sorted list into two groups such that the sum of probabilities in each group is as close as possible. Assign '0' to the left group and '1' to the right group.
For each group containing more than one symbol, repeat the division process. Append '0' for left subgroups and '1' for right subgroups to the existing codes.
Continue until each group contains exactly one symbol. The accumulated bits form the final code for that symbol.
Calculate the average code length, entropy, and coding efficiency using the formulas provided in the Analysis section.
Formulas and methods for evaluating the performance of Shannon-Fano coding.
Entropy (H)
H(X) = -Σ p(x) × log₂(p(x))
Measures average information content in bits per symbol
Average Code Length (L)
L = Σ p(x) × l(x)
Where l(x) is the length of code for symbol x
Efficiency (η)
η = H(X) / L × 100%
Ratio of entropy to average code length
Redundancy (R)
R = L - H(X)
Difference between actual and minimum possible length
Given symbols with probabilities:
Step 1: Sort
A(0.4), B(0.3), C(0.2), D(0.1)
Step 2: First Split
Group 1: A(0.4) + B(0.3) = 0.7 → 0
Group 2: C(0.2) + D(0.1) = 0.3 → 1
Step 3: Second Split
A(0.4) → 00, B(0.3) → 01
C(0.2) → 10, D(0.1) → 11
| Aspect | Shannon-Fano | Huffman |
|---|---|---|
| Optimality | Not always optimal | Always optimal (minimum redundancy) |
| Approach | Top-down (divide and conquer) | Bottom-up (merge lowest probabilities) |
| Complexity | O(n log n) - sorting dominated | O(n log n) - priority queue |
| Code Length Bound | H ≤ L ≤ H + 1 | H ≤ L ≤ H + 1 (tighter in practice) |
| Usage | Educational, historical (ZIP implode) | Widely used in compression (JPEG, MP3, etc.) |