blog/content/posts/2024-06-30-trie/index.md

117 lines
3.4 KiB
Markdown
Raw Normal View History

2024-06-30 13:35:08 +02:00
---
title: "Trie"
date: 2024-06-30T11:07:49+01:00
draft: false # I don't care for draft mode, git has branches for that
description: "A cool map"
tags:
- algorithms
- data structures
- python
categories:
- programming
series:
- Cool algorithms
favorite: false
disable_feed: false
---
This time, let's talk about the [_Trie_][wiki], which is a tree-based mapping
structure most often used for string keys.
[wiki]: https://en.wikipedia.org/wiki/Trie
<!--more-->
2024-06-30 13:36:17 +02:00
## What does it do?
A _Trie_ can be used to map a set of string keys to their corresponding values,
without the need for a hash function. This also means you won't suffer from hash
collisions, though the tree-based structure will probably translate to slower
performance than a good hash table.
A _Trie_ is especially useful to represent a dictionary of words in the case of
spell correction, as it can easily be used to fuzzy match words under a given
edit distance (think [Levenshtein distance])
[Levenshtein distance]: https://en.wikipedia.org/wiki/Levenshtein_distance
2024-06-30 13:36:42 +02:00
## Implementation
This implementation will be in Python for exposition purposes, even though
it already has a built-in `dict`.
### Representation
Creating a new `Trie` is easy: the root node starts off empty and without any
mapped values.
```python
class Trie[T]:
_children: dict[str, Trie[T]]
_value: T | None
def __init__(self):
# Each letter is mapped to a Trie
self._children = defaultdict(Trie)
# If we match a full string, we store the mapped value
self._value = None
```
We're using a `defaultdict` for the children for ease of implementation in this
post. In reality, I would encourage you exit early when you can't match a given
character.
The string key will be implicit by the position of a node in the tree: the empty
string at the root, one-character strings as its direct children, etc...
2024-06-30 13:37:04 +02:00
### Search
An exact match look-up is easily done: we go down the tree until we've exhausted
the key. At that point we've either found a mapped value or not.
```python
def get(self, key: str) -> T | None:
# Have we matched the full key?
if not key:
# Store the `T` if mapped, `None` otherwise
return self._value
# Otherwise, recurse on the child corresponding to the first letter
return self._children[key[0]].get(key[1:])
```
2024-06-30 13:37:21 +02:00
### Insertion
Adding a new value to the _Trie_ is similar to a key lookup, only this time we
store the new value instead of returning it.
```python
def insert(self, key: str, value: T) -> bool:
# Have we matched the full key?
if not key:
# Check whether we're overwriting a previous mapping
was_mapped = self._value is None
# Store the corresponding value
self._value = value
# Return whether we've performed an overwrite
return was_mapped
# Otherwise, recurse on the child corresponding to the first letter
return self._children[key[0]].insert(key[1:], value)
```
2024-06-30 13:37:48 +02:00
### Removal
Removal should also look familiar.
```python
def remove(self, key: str) -> bool:
# Have we matched the full key?
if not key:
was_mapped = self._value is None
# Remove the value
self._value = None
# Return whether it was mapped
return was_mapped
# Otherwise, recurse on the child corresponding to the first letter
return self._children[key[0]].remove(key[1:])
```