What is Shannon-Fano Coding?
Shannon-Fano coding is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). It is suboptimal in the sense that it does not always achieve the lowest possible expected code word length like Huffman coding, but it is the first step towards understanding entropy-based compression.
Named after Claude Shannon and Robert Fano, the algorithm works by recursively dividing the set of symbols into two sets of approximately equal probability. A '0' is assigned to one set and a '1' to the other, repeating until all sets contain only one symbol.
Key Characteristics
- ✓ Lossless compression technique.
- ✓ Prefix code (no code is a prefix of another).
- ✓ Top-down approach (unlike Huffman's bottom-up).
- ✓ Efficiency depends heavily on the splitting accuracy.
Entropy Relationship
Shannon-Fano aims to approximate the Shannon Entropy limit ($H$), which represents the theoretical minimum average code length.
The Algorithm Procedure
Sort
List the symbols in decreasing order of probability. The most frequent symbol comes first.
Split
Divide the list into two parts such that the total probability of the left part is as close as possible to the total probability of the right part.
Recurse
Assign '0' to the left part and '1' to the right part. Recursively apply steps 2 and 3 to each subset until no further division is possible.
Shannon-Fano Code Generator
Enter symbols and probabilities to generate the code tree.
Input Data
Code Tree & Results
| Symbol | Probability | Code | Length |
|---|---|---|---|
| Enter data and click "Generate Codes" | |||
Shannon-Fano vs. Huffman
| Feature | Shannon-Fano | Huffman |
|---|---|---|
| Approach | Top-down (Probabilistic Split) | Bottom-up (Merge Lowest Prob) |
| Optimality | Suboptimal (Not always best) | Optimal (Minimum redundancy) |
| Complexity | Generally faster to construct | Requires sorting/heap |
| Tree Shape | Tends to be more balanced | Can be very skewed |
| Usage | Educational, Historical (ZIP sometimes) | JPEG, MP3, PKZIP, GZIP |
Did you know?
While Huffman is generally better, Shannon-Fano is sometimes used in specific implementations of the ZIP file format (specifically "Shrink" and "Reduce" methods) because it can be faster to decode in hardware.