Understanding Prefix Trees

An overview of the Trie data structure

Aniruddha Karajgi
6 min readJan 8, 2024
Image by Author created using draw.io

About

This post explains the trie data structure and provides a simple implementation. The goal is to provide a quick reference for easy reviewing.

Tries are k-ary search trees. The nodes are mostly used to represent characters while traversing the trie locates words.

Table of Contents

Usecase
Design
Implementation
Complexity
Examples

The Usecase

Tries — or prefix trees — are useful for sequential word search. Common features like autocomplete or spell-checking are more efficient when implemented with a prefix tree.

The Design

Overview

The high-level overview of a trie is straightforward.

  • There’s a root node that represents the head of your trie.
  • The children of this root node represent all unique starting letters of the words in your trie.
  • The process continues: the second letters are children of these nodes, and so forth.
  • The end of each word is explicitly mentioned, say, with a boolean flag in that node.
A trie data structure. The * indicates word endings: Image by Author created using draw.io

Operations

We’ll define the following operations for our trie data structure:

  • add(word: str) : This operation adds a string to the trie.
  • search(word: str) : This operation checks if word exists in the trie
  • starts_with(word: str) : This operation checks if word is the prefix of any string in the trie.
  • remove(word: str) : This operation removes word from the trie

The Implementation

The TrieNode class

We’ll first define a TrieNode class. Each node in our trie will be an instance of this class.

As discussed in the design section, each node will have the following attributes to construct a simple trie:

  • a character value: val
  • a boolean flag to indicate if a word ends on this node: is_end
  • and a dictionary of children of the form str: TrieNode : children
class TrieNode:
def __init__(self, val=None):
self.val = val
self.children = dict()
self.is_end = False

Now, let’s define the Trie class and its methods.

The Trie class

We first instantiate the Trie’s root node. It’s an instance of our TrieNode class.

class Trie:

def __init__(self):
self.rootNode = TrieNode()

The add method

Next, let’s implement the add method. At a high level, it works like this:

  • start at the root node.
  • For every character in word, check if that character already exists in one of the current node’s children.
  • If it does, set the current node to the child that represents that character.
  • If it doesn’t, add a child that represents that character.
  • Set the last node’s is_end attribute to True, since it now represents the word we just added.
def add(self, word: str) -> None:
"""
insert word into the trie
"""
curr_node = self.rootNode

for char in word:
if char not in curr_node.children:
curr_node.children[char] = TrieNode(char)

curr_node = curr_node.children[char]

curr_node.is_end = True

The search method

This method follows a similar logic for traversing the trie.

  • Traverse the trie, looking for children that represent the current character of the word we’re searching for.
  • If at any step, there’s no such child, return False.
  • Once the last character has been found, check if that node has is_end set to True. Return True only if that’s the case.
def search(self, word: str) -> bool:
"""
check if word exists in the trie
"""
curr_node = self.rootNode

for char in word:
if char not in curr_node.children:
return False

curr_node = curr_node.children[char]

return curr_node.is_end

The starts_with method

The method is exactly like the search method, except that it doesn’t matter if the last node we land on represents a complete word or not.

def starts_with(self, prefix: str) -> bool:
"""
check if word is the prefix of some string
in the trie.
"""
curr_node = self.rootNode

for char in prefix:
if char not in curr_node.children:
return False

curr_node = curr_node.children[char]

return True

The remove method

This is the most complex method to implement so far. While there are a few ways to implement this, the general logic goes like this:

  • Traverse the trie looking for each character in the word to be removed.
  • Once the last node is found, set its is_end attribute to False, since it no longer represents the word we’ve just removed.
  • Next, if this last node has no children, we can simply delete it. Likewise, its parent node may also not be needed.
  • Traverse up the trie deleting nodes that are no longer needed — that is, they have no children.
def remove(self, word: str) -> None:
"""
Remove word from the trie.
"""
curr_node = self.rootNode
branch = [curr_node]

# traverse the tree and keep track of the nodes encountered
for char in word:
curr_node = curr_node.children[char]
branch.append(curr_node)

# set the last node's is_end to False
curr_node.is_end = False

# go back up the tree, deleting nodes that no longer have any children
n = len(branch)
for i in range(n-2, -1):
j = i + 1
if not branch[j].children:
del branch[i].children[branch[j].val]

return None

This implementation of remove assumes that word exists in the trie. But handling that edge case is very straightforward, so I won’t be discussing that.

The remove method can also be conveniently implemented with a stack.

The complete implementation

class TrieNode:
def __init__(self, val=None):
self.val = val
self.children = dict()
self.is_end = False


class Trie:

def __init__(self):
self.rootNode = TrieNode()

def add(self, word: str) -> None:
"""
insert word into the trie
"""
curr_node = self.rootNode

for char in word:
if char not in curr_node.children:
curr_node.children[char] = TrieNode(char)

curr_node = curr_node.children[char]

curr_node.is_end = True

def search(self, word: str) -> bool:
"""
check if word exists in the trie
"""
curr_node = self.rootNode

for char in word:
if char not in curr_node.children:
return False

curr_node = curr_node.children[char]

return curr_node.is_end

def starts_with(self, word: str) -> bool:
"""
check if word is the prefix of some string
in the trie.
"""
curr_node = self.rootNode

for char in word:
if char not in curr_node.children:
return False

curr_node = curr_node.children[char]

return True

def remove(self, word: str) -> None:
"""
Remove word from the trie.
"""
curr_node = self.rootNode
branch = [curr_node]

# traverse the tree and keep track of the nodes encountered
for char in word:
curr_node = curr_node.children[char]
branch.append(curr_node)

# set the last node's is_end to False
curr_node.is_end = False

# go back up the tree, deleting nodes that no longer have any children
n = len(branch)
for i in range(n-2, -1):
j = i + 1
if not branch[j].children:
del branch[i].children[branch[j].val]

return None

Usage

Using this implementation comes down to instantiating the Trie class and using its methods to add and search for words.

trie = Trie();
trie.add("apple");

trie.search("apple"); # return True
trie.search("app"); # return False
trie.starts_with("app"); # return True

trie.add("app");
trie.search("app"); # return True

Complexity Analysis

Considering an average word length of n , average insertion time is linear as we iterate through each character, creating a node when required.

Searching and prefix searching (startswith) are also linear, since we use the same traversal logic.

Removal is linear as well since we go down the trie once and then go up.

Add: O(n)

Search: O(n)

Prefix-search: O(n)

Remove: O(n)

--

--

Aniruddha Karajgi

Data Scientist at Tekion | Samsung Research | GSoC | CS at BITS Pilani ’21 | polaris000.com | LinkedIn: linkedin.com/in/polaris000/