blog/content/posts/2024-06-30-trie/index.md

2.3 KiB

title date draft description tags categories series favorite disable_feed
Trie 2024-06-30T11:07:49+01:00 false A cool map
algorithms
data structures
python
programming
Cool algorithms
false false

This time, let's talk about the Trie, which is a tree-based mapping structure most often used for string keys.

What does it do?

A Trie can be used to map a set of string keys to their corresponding values, without the need for a hash function. This also means you won't suffer from hash collisions, though the tree-based structure will probably translate to slower performance than a good hash table.

A Trie is especially useful to represent a dictionary of words in the case of spell correction, as it can easily be used to fuzzy match words under a given edit distance (think Levenshtein distance)

Implementation

This implementation will be in Python for exposition purposes, even though it already has a built-in dict.

Representation

Creating a new Trie is easy: the root node starts off empty and without any mapped values.

class Trie[T]:
    _children: dict[str, Trie[T]]
    _value: T | None

    def __init__(self):
        # Each letter is mapped to a Trie
        self._children = defaultdict(Trie)
        # If we match a full string, we store the mapped value
        self._value = None

We're using a defaultdict for the children for ease of implementation in this post. In reality, I would encourage you exit early when you can't match a given character.

The string key will be implicit by the position of a node in the tree: the empty string at the root, one-character strings as its direct children, etc...

An exact match look-up is easily done: we go down the tree until we've exhausted the key. At that point we've either found a mapped value or not.

def get(self, key: str) -> T | None:
    # Have we matched the full key?
    if not key:
        # Store the `T` if mapped, `None` otherwise
        return self._value
    # Otherwise, recurse on the child corresponding to the first letter
    return self._children[key[0]].get(key[1:])