117 lines
3.4 KiB
Markdown
117 lines
3.4 KiB
Markdown
---
|
|
title: "Trie"
|
|
date: 2024-06-30T11:07:49+01:00
|
|
draft: false # I don't care for draft mode, git has branches for that
|
|
description: "A cool map"
|
|
tags:
|
|
- algorithms
|
|
- data structures
|
|
- python
|
|
categories:
|
|
- programming
|
|
series:
|
|
- Cool algorithms
|
|
favorite: false
|
|
disable_feed: false
|
|
---
|
|
|
|
This time, let's talk about the [_Trie_][wiki], which is a tree-based mapping
|
|
structure most often used for string keys.
|
|
|
|
[wiki]: https://en.wikipedia.org/wiki/Trie
|
|
|
|
<!--more-->
|
|
|
|
## What does it do?
|
|
|
|
A _Trie_ can be used to map a set of string keys to their corresponding values,
|
|
without the need for a hash function. This also means you won't suffer from hash
|
|
collisions, though the tree-based structure will probably translate to slower
|
|
performance than a good hash table.
|
|
|
|
A _Trie_ is especially useful to represent a dictionary of words in the case of
|
|
spell correction, as it can easily be used to fuzzy match words under a given
|
|
edit distance (think [Levenshtein distance])
|
|
|
|
[Levenshtein distance]: https://en.wikipedia.org/wiki/Levenshtein_distance
|
|
|
|
## Implementation
|
|
|
|
This implementation will be in Python for exposition purposes, even though
|
|
it already has a built-in `dict`.
|
|
|
|
### Representation
|
|
|
|
Creating a new `Trie` is easy: the root node starts off empty and without any
|
|
mapped values.
|
|
|
|
```python
|
|
class Trie[T]:
|
|
_children: dict[str, Trie[T]]
|
|
_value: T | None
|
|
|
|
def __init__(self):
|
|
# Each letter is mapped to a Trie
|
|
self._children = defaultdict(Trie)
|
|
# If we match a full string, we store the mapped value
|
|
self._value = None
|
|
```
|
|
|
|
We're using a `defaultdict` for the children for ease of implementation in this
|
|
post. In reality, I would encourage you exit early when you can't match a given
|
|
character.
|
|
|
|
The string key will be implicit by the position of a node in the tree: the empty
|
|
string at the root, one-character strings as its direct children, etc...
|
|
|
|
### Search
|
|
|
|
An exact match look-up is easily done: we go down the tree until we've exhausted
|
|
the key. At that point we've either found a mapped value or not.
|
|
|
|
```python
|
|
def get(self, key: str) -> T | None:
|
|
# Have we matched the full key?
|
|
if not key:
|
|
# Store the `T` if mapped, `None` otherwise
|
|
return self._value
|
|
# Otherwise, recurse on the child corresponding to the first letter
|
|
return self._children[key[0]].get(key[1:])
|
|
```
|
|
|
|
### Addition
|
|
|
|
Adding a new value to the _Trie_ is similar to a key lookup, only this time we
|
|
store the new value instead of returning it.
|
|
|
|
```python
|
|
def insert(self, key: str, value: T) -> bool:
|
|
# Have we matched the full key?
|
|
if not key:
|
|
# Check whether we're overwriting a previous mapping
|
|
was_mapped = self._value is None
|
|
# Store the corresponding value
|
|
self._value = value
|
|
# Return whether we've performed an overwrite
|
|
return was_mapped
|
|
# Otherwise, recurse on the child corresponding to the first letter
|
|
return self._children[key[0]].insert(key[1:], value)
|
|
```
|
|
|
|
### Removal
|
|
|
|
Removal should also look familiar.
|
|
|
|
```python
|
|
def remove(self, key: str) -> bool:
|
|
# Have we matched the full key?
|
|
if not key:
|
|
was_mapped = self._value is None
|
|
# Remove the value
|
|
self._value = None
|
|
# Return whether it was mapped
|
|
return was_mapped
|
|
# Otherwise, recurse on the child corresponding to the first letter
|
|
return self._children[key[0]].remove(key[1:])
|
|
```
|