Huffman Coding

Total Encoding Length means the total number of bits needed for the “prefix codes”

ABL(Averyage Number of bits per letter) is simply TEL / n (n is the total number of letters).

An optimal prefix codes is that with the smallest ABL.

Definition: A binary tree is full if each internal node has 2 children.

Clamin: The binary tree corresponding to the optimal prefix code is full.

Proof:

Let K be an optimal prefix code. Assume there is a T such that T is a binary tree corresponding to an optimal prefix code K and T is not full – T has a node u that has exactly one child v.

  1. If u is the root, we can just make a new tree T’ by deleting u and making v the root.
  2. If u is not root, we can make a new tree T
    1. by deleting u and mking v the child of w.

And it is true that, by either case, the number of bits needed decreases, and we can stilll make them intact for each other.

Let K’ be such prefix code. We have ABL(K’) < ABL(K). It is against the fact that K is an optimal prefix code. So, the assumption that there is a T such that T is a binary tree corresponding to an optimal prefix code K and T is not full – T has a node u that has exactly one child v, is false.

But how to find such a tree?

Corollary: There is an optimal binary tree T*, in which the two lowest frequency letters are assigned to leaves that are siblinsg.

Here comes the Huffman Algorithm

function HuffmanEncoding(S, freq)
  if S == {a, b}
    y(a) = 0
    y(b) = 1
  else
    Let y* and z* be the two lowest frequency letters
    Form a new alphabet S' = S - (y*, z*) + w, with freq(w) = freq(y*) + freq(z*)
    y' = HuffmanEncoding(S', freq)
    Extend y' to a prefix code y for S as follows:
      y(y*) = y'(w).append(0)
      y(z*) = y'(w).append(1)
  return y