diff --git a/content/posts/2024-07-06-gap-buffer/index.md b/content/posts/2024-07-06-gap-buffer/index.md index 763628d..ace8fd9 100644 --- a/content/posts/2024-07-06-gap-buffer/index.md +++ b/content/posts/2024-07-06-gap-buffer/index.md @@ -121,71 +121,3 @@ def grow(self, capacity: int) -> None: self._buf = new_buf self._gap_end += added_capacity ``` - -### Insertion - -Inserting text at the cursor's position means filling up the gap in the middle -of the buffer. To do so we must first make sure that the gap is big enough, or -grow the buffer accordingly. - -Then inserting the text is simply a matter of copying its characters in place, -and moving the start of the gap further right. - -```python -def insert(self, val: str) -> None: - # Ensure we have enouh space to insert the whole string - if len(val) > self.gap_length: - self.grow(max(self.capacity * 2, self.string_length + len(val))) - # Fill the gap with the given string - self._buf[self._gap_start : self._gap_start + len(val)] = val - self._gap_start += len(val) -``` - -### Deletion - -Removing text from the buffer simply expands the gap in the corresponding -direction, shortening the string's prefix/suffix. This makes it very cheap. - -The methods are named after the `backspace` and `delete` keys on the keyboard. - -```python -def backspace(self, dist: int = 1) -> None: - assert dist <= self.prefix_length - # Extend gap to the left - self._gap_start -= dist - -def delete(self, dist: int = 1) -> None: - assert dist <= self.suffix_length - # Extend gap to the right - self._gap_end += dist -``` - -### Moving the cursor - -Moving the cursor along the buffer will shift letters from one side of the gap -to the other, moving them accross from prefix to suffix and back. - -I find Python's list slicing not quite as elegant to read as a `memmove`, though -it does make for a very small and efficient implementation. - -```python -def left(self, dist: int = 1) -> None: - assert dist <= self.prefix_length - # Shift the needed number of characters from end of prefix to start of suffix - self._buf[self._gap_end - dist : self._gap_end] = self._buf[ - self._gap_start - dist : self._gap_start - ] - # Adjust indices accordingly - self._gap_start -= dist - self._gap_end -= dist - -def right(self, dist: int = 1) -> None: - assert dist <= self.suffix_length - # Shift the needed number of characters from start of suffix to end of prefix - self._buf[self._gap_start : self._gap_start + dist] = self._buf[ - self._gap_end : self._gap_end + dist - ] - # Adjust indices accordingly - self._gap_start += dist - self._gap_end += dist -``` diff --git a/content/posts/2024-07-14-bloom-filter/index.md b/content/posts/2024-07-14-bloom-filter/index.md deleted file mode 100644 index 93107d4..0000000 --- a/content/posts/2024-07-14-bloom-filter/index.md +++ /dev/null @@ -1,97 +0,0 @@ ---- -title: "Bloom Filter" -date: 2024-07-14T17:46:40+01:00 -draft: false # I don't care for draft mode, git has branches for that -description: "Probably cool" -tags: - - algorithms - - data structures - - python -categories: - - programming -series: -- Cool algorithms -favorite: false -disable_feed: false ---- - -The [_Bloom Filter_][wiki] is a probabilistic data structure for set membership. - -The filter can be used as an inexpensive first step when querying the actual -data is quite costly (e.g: as a first check for expensive cache lookups or large -data seeks). - -[wiki]: https://en.wikipedia.org/wiki/Bloom_filter - - - -## What does it do? - -A _Bloom Filter_ can be understood as a hash-set which can either tell you: - -* An element is _not_ part of the set. -* An element _may be_ part of the set. - -More specifically, one can tweak the parameters of the filter to make it so that -the _false positive_ rate of membership is quite low. - -I won't be going into those calculations here, but they are quite trivial to -compute, or one can just look up appropriate values for their use case. - -## Implementation - -I'll be using Python, which has the nifty ability of representing bitsets -through its built-in big integers quite easily. - -We'll be assuming a `BIT_COUNT` of 64 here, but the implementation can easily be -tweaked to use a different number, or even change it at construction time. - -### Representation - -A `BloomFilter` is just a set of bits and a list of hash functions. - -```python -BIT_COUNT = 64 - -class BloomFilter[T]: - _bits: int - _hash_functions: list[Callable[[T], int]] - - def __init__(self, hash_functions: list[Callable[[T], int]]) -> None: - # Filter is initially empty - self._bits = 0 - self._hash_functions = hash_functions -``` - -### Inserting a key - -To add an element to the filter, we take the output from each hash function and -use that to set a bit in the filter. This combination of bit will identify the -element, which we can use for lookup later. - -```python -def insert(self, val: T) -> None: - # Iterate over each hash - for f in self._hash_functions: - n = f(val) % BIT_COUNT - # Set the corresponding bit - self._bit |= 1 << n -``` - -### Querying a key - -Because the _Bloom Filter_ does not actually store its elements, but some -derived data from hashing them, it can only definitely say if an element _does -not_ belong to it. Otherwise, it _may_ be part of the set, and should be checked -against the actual underlying store. - -```python -def may_contain(self, val: T) -> bool: - for f in self._hash_functions: - n = f(val) % BIT_COUNT - # If one of the bits is unset, the value is definitely not present - if not (self._bit & (1 << n)): - return False - # All bits were matched, `val` is likely to be part of the set - return True -```