Understanding Prefix Trees
An overview of the Trie data structure
About
This post explains the trie data structure and provides a simple implementation. The goal is to provide a quick reference for easy reviewing.
Tries are k-ary search trees. The nodes are mostly used to represent characters while traversing the trie locates words.
Table of Contents
Usecase
Design
Implementation
Complexity
Examples
The Usecase
Tries — or prefix trees — are useful for sequential word search. Common features like autocomplete or spell-checking are more efficient when implemented with a prefix tree.
The Design
Overview
The high-level overview of a trie is straightforward.
- There’s a root node that represents the head of your trie.
- The children of this root node represent all unique starting letters of the words in your trie.
- The process continues: the second letters are children of these nodes, and so forth.
- The end of each word is explicitly mentioned, say, with a boolean flag in that node.
Operations
We’ll define the following operations for our trie data structure:
add(word: str)
: This operation adds a string to the trie.search(word: str)
: This operation checks ifword
exists in the triestarts_with(word: str)
: This operation checks ifword
is the prefix of any string in the trie.remove(word: str)
: This operation removesword
from the trie
The Implementation
The TrieNode class
We’ll first define a TrieNode
class. Each node in our trie will be an instance of this class.
As discussed in the design section, each node will have the following attributes to construct a simple trie:
- a character value:
val
- a boolean flag to indicate if a word ends on this node:
is_end
- and a dictionary of children of the form
str: TrieNode
:children
class TrieNode:
def __init__(self, val=None):
self.val = val
self.children = dict()
self.is_end = False
Now, let’s define the Trie
class and its methods.
The Trie class
We first instantiate the Trie’s root node. It’s an instance of our TrieNode
class.
class Trie:
def __init__(self):
self.rootNode = TrieNode()
The add method
Next, let’s implement the add
method. At a high level, it works like this:
- start at the root node.
- For every character in
word
, check if that character already exists in one of the current node’s children. - If it does, set the current node to the child that represents that character.
- If it doesn’t, add a child that represents that character.
- Set the last node’s
is_end
attribute to True, since it now represents the word we just added.
def add(self, word: str) -> None:
"""
insert word into the trie
"""
curr_node = self.rootNode
for char in word:
if char not in curr_node.children:
curr_node.children[char] = TrieNode(char)
curr_node = curr_node.children[char]
curr_node.is_end = True
The search method
This method follows a similar logic for traversing the trie.
- Traverse the trie, looking for children that represent the current character of the word we’re searching for.
- If at any step, there’s no such child, return
False
. - Once the last character has been found, check if that node has
is_end
set toTrue
. ReturnTrue
only if that’s the case.
def search(self, word: str) -> bool:
"""
check if word exists in the trie
"""
curr_node = self.rootNode
for char in word:
if char not in curr_node.children:
return False
curr_node = curr_node.children[char]
return curr_node.is_end
The starts_with method
The method is exactly like the search
method, except that it doesn’t matter if the last node we land on represents a complete word or not.
def starts_with(self, prefix: str) -> bool:
"""
check if word is the prefix of some string
in the trie.
"""
curr_node = self.rootNode
for char in prefix:
if char not in curr_node.children:
return False
curr_node = curr_node.children[char]
return True
The remove method
This is the most complex method to implement so far. While there are a few ways to implement this, the general logic goes like this:
- Traverse the trie looking for each character in the word to be removed.
- Once the last node is found, set its
is_end
attribute toFalse
, since it no longer represents the word we’ve just removed. - Next, if this last node has no children, we can simply delete it. Likewise, its parent node may also not be needed.
- Traverse up the trie deleting nodes that are no longer needed — that is, they have no children.
def remove(self, word: str) -> None:
"""
Remove word from the trie.
"""
curr_node = self.rootNode
branch = [curr_node]
# traverse the tree and keep track of the nodes encountered
for char in word:
curr_node = curr_node.children[char]
branch.append(curr_node)
# set the last node's is_end to False
curr_node.is_end = False
# go back up the tree, deleting nodes that no longer have any children
n = len(branch)
for i in range(n-2, -1):
j = i + 1
if not branch[j].children:
del branch[i].children[branch[j].val]
return None
This implementation of remove
assumes that word
exists in the trie. But handling that edge case is very straightforward, so I won’t be discussing that.
The remove
method can also be conveniently implemented with a stack.
The complete implementation
class TrieNode:
def __init__(self, val=None):
self.val = val
self.children = dict()
self.is_end = False
class Trie:
def __init__(self):
self.rootNode = TrieNode()
def add(self, word: str) -> None:
"""
insert word into the trie
"""
curr_node = self.rootNode
for char in word:
if char not in curr_node.children:
curr_node.children[char] = TrieNode(char)
curr_node = curr_node.children[char]
curr_node.is_end = True
def search(self, word: str) -> bool:
"""
check if word exists in the trie
"""
curr_node = self.rootNode
for char in word:
if char not in curr_node.children:
return False
curr_node = curr_node.children[char]
return curr_node.is_end
def starts_with(self, word: str) -> bool:
"""
check if word is the prefix of some string
in the trie.
"""
curr_node = self.rootNode
for char in word:
if char not in curr_node.children:
return False
curr_node = curr_node.children[char]
return True
def remove(self, word: str) -> None:
"""
Remove word from the trie.
"""
curr_node = self.rootNode
branch = [curr_node]
# traverse the tree and keep track of the nodes encountered
for char in word:
curr_node = curr_node.children[char]
branch.append(curr_node)
# set the last node's is_end to False
curr_node.is_end = False
# go back up the tree, deleting nodes that no longer have any children
n = len(branch)
for i in range(n-2, -1):
j = i + 1
if not branch[j].children:
del branch[i].children[branch[j].val]
return None
Usage
Using this implementation comes down to instantiating the Trie class and using its methods to add and search for words.
trie = Trie();
trie.add("apple");
trie.search("apple"); # return True
trie.search("app"); # return False
trie.starts_with("app"); # return True
trie.add("app");
trie.search("app"); # return True
Complexity Analysis
Considering an average word length of n
, average insertion time is linear as we iterate through each character, creating a node when required.
Searching and prefix searching (startswith
) are also linear, since we use the same traversal logic.
Removal is linear as well since we go down the trie once and then go up.
Add: O(n)
Search: O(n)
Prefix-search: O(n)
Remove: O(n)
Examples
These are some questions on Leetcode to give you a good idea of how tries work in practice.