Trie Data Structure

1. Trie Introduction

Tree data structure has many variants like Binary Search Tree, R- Tree, X-tree etc. Trie is one of them. It is similar to Binary Search Tree. Trie was first developed by Rene de la Briandais in 1959. Trie also called Prefix Tree is an ordered tree data structure that is used for indexing and searching the string sequence in a text. The difference between Trie and Binary Search Tree is that In Trie, the position of the node determines which key it is connected with, instead of using that node for storing keys. Here the keys are the strings.

One interesting fact is its pronunciation. Since, the main idea behind this data structure is retrieval of data. That is why it is named as Trie. Because of its origin, it is pronounced as tree. But to distinguish it from tree data structure, it is pronounced as try.

Trie consists of nodes and edges. Edges connect parent node to its child nodes. Each node contains an array of pointers, these pointers are 26 in number. One pointer for each alphabet. Empty string is associated with the root node and all the successors of a node have a common prefix string associated with the parent node. Generally, the values are associated with the leaf nodes. On the basis of prefix, strings are stored in top to bottom fashion. That is why, Trie is also known as Prefix Tree.

2. Binary Trie

2.1 In Binary Trie, only the leaf nodes contain keys.

In this section, we’ll discuss some properties of Binary trie.

  • Binary trie follows order of keys strictly. For each node, all keys in its left subtree are smaller than keys in its right subtree. Hence, lexicographically ordered.
  • The structure of the trie is dependent on its keys only.
  • The traversal of trie is done in sorted order.
  • If there are two keys, binary representation of one key is prefix of the other, then these two keys cannot coexist in trie. This condition is automatically satisfied when there are same number of bits of all keys. But if number of bits is not same for all the keys. Then, this condition may not be satisfied. To solve this problem, the concept of end bit was introduced.

2.2 Binary tries with end bits

  • As we said earlier, the main objective behind this end bit concept is to store keys having binary representations as prefix of each other.
  • The end bit stored in a node is used to represent the end of a stored key. In other words, an end bit containing in a node will be true if that node represents key.
  • In addition to leaf nodes, internal nodes can also contains keys.
  • The pre-order traversal provides keys in lexicographical order.

3. Multi way Trie

Multiway tries are also called R-ary tries.

  • Multiway trie strictly follows the order of keys. That means, For each node, all the keys in left subtree are smaller than that of right subtree. Hence, giving lexicographical ordering.
  • To reach up to a node say ‘N’, the sequence of the digits which will be followed would be the same sequence as prefix contained by all the keys of the subtree having ‘N’ as its root.
  • If the radix is R=2^b that means number of bits per digit is ‘b’ and the maximum number of bits contained in a key is ‘B’. Then, worst case height containing ‘K’ keys will be B/b.
  • The main disadvantage of this data structure is its additional space requirement.

4. Basic Operations in Trie

In this section, we’ll discuss different operations of trie.

Before moving towards the operations, Let’s first create the basic entities that is node and a class for using it.

NODE

public class Node
{
public char key { get; set; }
public bool leaf { get; set; }
public Sort_D< char, Node> Links { get; private set; }
public Node(char key, bool leaf = false)
{
key = key;
Terminal = leaf;
Links = new Sort_D< char, TrieNode>();
}
}

Trie class

public class Trie
{
private Node root { get; set; }
public Trie()
{
// Creating dummy root 
Root = new Node(default(char));
}
}

Now, We are all set to discuss basic operations. First we’ll discuss the Insert operation.

Insertion : Here for adding an element, we need to search first. First it checks if no more nodes are there, it inserts the new node and then that new chain is followed. Here we are not considering empty strings.

public void insert(string a)
{
if (string.IsNullOrEmpty(a))
{
return;
}
var b = root;
for var c in a
{
if (!b.Links.ContainsKey(c))
{
// No more nodes are there
b.Links.Insert(c, new Node(c));
}
b= b.Links[c];
}
// No more characters, verify the leaf
b.leaf= true;
}

Search: In search operations, it scans through each part of the trie until no more parts or more specifically nodes remain.

Public bool search(string a)
{
var b = root;
for var c in a
{
if (!b.Links.ContainsKey(c))
{
// No match found
return false;
}
b = b.Links[c];
}
// 
return b.leaf;
}

Auto-complete : With a given prefix string, using this algorithm we can determine a list of available suffixes. This will help in completing chains using its child nodes.

public List< string> AutoComplete(string a)
{
var res = new List< string>();
var b = root;
for var c in a
{
if (!b.Links.ContainsKey(c))
{
return new List< string>();
}
b = b.Links[c];
}
// Generate all the possible suffixes
AutoComplete(b, string.Empty, res);
return res ;
}
private void AutoComplete(Node node, string temp, List< string> suffixes)
{
if (node.leaf)
{
suffixes.Add(temp);
}
For var l in node.Links
{
AutoComplete(link.Value, temp + link.Value.Key, suffixes);
}
}

Deletion : Find the node you want to delete. The main complication in this operation is to delete only the matching chain while preserving the suffix chain.

Pseudocode for recursive approach:

public void delete(string a)
{
delete(root, a);
}
private bool delete(Node root, string a)
{
if (string.IsNullOrEmpty(a))
{
// Base case: check for leaf node
if (root.leaf)
{
root.leaf = false;
return root.Links.Count == 0;
}
return false;
}
else
{
if (root.Links.ContainsKey(a[0]))
{
// iFound a matching chain, go further down
if (delete(root.Links[a[0]], a.Substring(1)))
{
// The previous recursive step says we can remove the child node
root.Links.delete(a[0]);
return root.Links.Count == 0 && root.Terminal == false;
}
}
return false;
}
}

Now comes the iterative approach:

public void delete(string a)
{
var b = root;
for var c in a
{
if (!b.Links.ContainsKey(c))
{
return;
}
b = b.Links[c];
}
if (b.leaf)
{
b.leaf = false;
// move upside looking for the nodes for removing them
while (!b.Terminal && b.Links.Count == 0 && b.Parent != null)
{
b.Parent.Links.delete(b.key);
b = b.Parent;
}
}
}

5. Program in Java

In this section, we have given complete implementation of Trie

class Node {
char c;
HashMap< Character, Node> children = new HashMap< Character, Node>();
boolean leaf;
public Node() {}
public Node(char c){
this.c = c;
}
}
public class Trie {
private Node root;
public Trie() {
root = new Node();
}
// Inserts a word into the trie.
public void insert(String word) {
HashMap< Character, Node> child = root.child;
for(int a=0; a< word.length(); a++){
char c = word.charAt(a);
Node n;
if(child.containsKey(c)){
n = child.get(c);
}else{
n= new Node(c);
child.put(c, n);
}
child = c.children;
//set leaf node
if(a==word.length()-1)
n.leaf = true;    
}
}
// Returns if the word is present in trie.
public boolean search(String word) {
Node n = searchNode(word);
if(n != null && n.leaf) 
return true;
else
return false;
}
// returns if any word with the given prefix is found
public boolean startsWith(String pre) {
if(search(pre) == null) 
return false;
else
return true;
}
public Node search(String str){
Map< Character, Node> children = root.child; 
Node t = null;
for(int a=0; a< str.length(); a++){
char c = str.charAt(a);
if(child.containsKey(c)){
t = child.get(c);
child = t.child;
}else{
return null;
}
}
return t;
}
}

6. Analysis of Trie

To compare between characters stored in the trie and the characters of input string, at most 26 iterations needs to be done. Therefore, it will take O(n) time where n is the length of the input string.

Insertion will have the same asymptotic complexity.
Tries lie between the binary search trees and hash table.

Search: Search operation is faster in a trie as compared to binary search tree. But it is slower than that of hash table, where the time complexity is O(1) with an O(n) hash execution.

Trie Construction: For constructing trie, time complexity is O(Nn) where N is the number of elements and n is the length of largest element.

Both Insertion and deletion takes O(n) time.