허프만 코딩

<시간/>

허프만 코딩은 무손실 데이터 압축 알고리즘입니다. 이 알고리즘에서 가변 길이 코드는 다른 문자를 입력하기 위해 할당됩니다. 코드 길이는 문자가 사용되는 빈도와 관련이 있습니다. 가장 자주 사용되는 문자에는 가장 작은 코드가 있고 가장 적게 사용되는 문자에는 더 긴 코드가 있습니다.

크게 두 부분이 있습니다. 첫 번째는 Huffman 트리를 생성하고 다른 하나는 트리를 탐색하여 코드를 찾습니다.

예를 들어 일부 문자열 "YYYZXXYYX"를 고려하면 문자 Y의 빈도는 X보다 크고 문자 Z는 빈도가 가장 낮습니다. 따라서 Y의 코드 길이는 X보다 작고 X의 코드는 Z보다 작습니다.

빈도에 따라 각 문자에 코드를 할당하는 복잡성은 O(n log n)

입력 − "ACCEBFFFFAAXXBLKE"와 같이 다른 문자가 포함된 문자열
출력 − 다른 문자에 대한 코드:

Data: K, Frequency: 1, Code: 0000
Data: L, Frequency: 1, Code: 0001
Data: E, Frequency: 2, Code: 001
Data: F, Frequency: 4, Code: 01
Data: B, Frequency: 2, Code: 100
Data: C, Frequency: 2, Code: 101
Data: X, Frequency: 2, Code: 110
Data: A, Frequency: 3, Code: 111

알고리즘

허프만 코딩(문자열)

입력 − 다른 문자의 문자열입니다.

출력 − 각 개별 문자에 대한 코드입니다.

Begin
   define a node with character, frequency, left and right child of the node for Huffman tree.
   create a list ‘freq’ to store frequency of each character, initially all are 0
   for each character c in the string do
      increase the frequency for character ch in freq list.
   done
   for all type of character ch do
      if the frequency of ch is non zero then add ch and its frequency as a node of priority queue Q.
   done
   while Q is not empty do
      remove item from Q and assign it to left child of node
      remove item from Q and assign to the right child of node
      traverse the node to find the assigned code
   done
End

traverseNode(n:노드, 코드)

입력 − 허프만 트리의 노드 n과 이전 호출에서 할당된 코드

출력 − 각 문자에 할당된 코드

if left child of node n ≠ φ then
   traverseNode(leftChild(n), code+’0’) //traverse through the left child
   traverseNode(rightChild(n), code+’1’) //traverse through the right child
else
   display the character and data of current node.

예시

#include<iostream>
#include<queue>
#include<string>
using namespace std;
struct node{
   int freq;
   char data;
   const node *child0, *child1;
   node(char d, int f = -1){ //assign values in the node
      data = d;
      freq = f;
      child0 = NULL;
      child1 = NULL;
   }
   node(const node *c0, const node *c1){
      data = 0;
      freq = c0->freq + c1->freq;
      child0=c0;
      child1=c1;
   }
   bool operator<( const node &a ) const { //< operator performs to find priority in queue
      return freq >a.freq;
   }
   void traverse(string code = "")const{
      if(child0!=NULL){
         child0->traverse(code+'0'); //add 0 with the code as left child
         child1->traverse(code+'1'); //add 1 with the code as right child
      }else{
         cout << "Data: " << data<< ", Frequency: "<<freq << ", Code: " << code<<endl;
      }
   }
};
void huffmanCoding(string str){
   priority_queue<node> qu;
   int frequency[256];
   for(int i = 0; i<256; i++)
      frequency[i] = 0; //clear all frequency
   for(int i = 0; i<str.size(); i++){
      frequency[int(str[i])]++; //increase frequency
   }
   for(int i = 0; i<256; i++){
      if(frequency[i]){
         qu.push(node(i, frequency[i]));
      }
   }
   while(qu.size() >1){
      node *c0 = new node(qu.top()); //get left child and remove from queue
      qu.pop();
      node *c1 = new node(qu.top()); //get right child and remove from queue
      qu.pop();
      qu.push(node(c0, c1)); //add freq of two child and add again in the queue
   }
   cout << "The Huffman Code: "<<endl;
   qu.top().traverse(); //traverse the tree to get code
}
main(){
   string str = "ACCEBFFFFAAXXBLKE"; //arbitray string to get frequency
   huffmanCoding(str);
}

출력

The Huffman Code:
Data: K, Frequency: 1, Code: 0000
Data: L, Frequency: 1, Code: 0001
Data: E, Frequency: 2, Code: 001
Data: F, Frequency: 4, Code: 01
Data: B, Frequency: 2, Code: 100
Data: C, Frequency: 2, Code: 101
Data: X, Frequency: 2, Code: 110
Data: A, Frequency: 3, Code: 111