diff --git a/content/posts/2024-07-06-gap-buffer/index.md b/content/posts/2024-07-06-gap-buffer/index.md deleted file mode 100644 index e70e3e2..0000000 --- a/content/posts/2024-07-06-gap-buffer/index.md +++ /dev/null @@ -1,191 +0,0 @@ ---- -title: "Gap Buffer" -date: 2024-07-06T21:27:19+01:00 -draft: false # I don't care for draft mode, git has branches for that -description: "As featured in GNU Emacs" -tags: -- algorithms -- data structures -- python -categories: -- programming -series: -- Cool algorithms -favorite: false -disable_feed: false ---- - -The [_Gap Buffer_][wiki] is a popular data structure for text editors to -represent files and editable buffers. The most famous of them probably being -[GNU Emacs][emacs]. - -[wiki]: https://en.wikipedia.org/wiki/Gap_buffer -[emacs]: https://www.gnu.org/software/emacs/manual/html_node/elisp/Buffer-Gap.html - - - -## What does it do? - -A _Gap Buffer_ is simply a list of characters, similar to a normal string, with -the added twist of splitting it into two side: the prefix and suffix, on either -side of the cursor. In between them, a gap is left to allow for quick -insertion at the cursor. - -Moving the cursor moves the gap around the buffer, the prefix and suffix getting -shorter/longer as required. - -## Implementation - -I'll be writing a sample implementation in Python, as with the rest of the -series. I don't think it showcases the elegance of the _Gap Buffer_ in action -like a C implementation full of `memmove`s would, but it does makes it short and -sweet. - -### Representation - -We'll be representing the gap buffer as an actual list of characters. - -Given that Python doesn't _have_ characters, let's settle for a list of strings, -each representing a single character... - -```python -Char = str - -class GapBuffer: - # List of characters, contains prefix and suffix of string with gap in the middle - _buf: list[Char] - # The gap is contained between [start, end) (i.e: buf[start:end]) - _gap_start: int - _gap_end: int - - # Visual representation of the gap buffer: - # This is a very [ ]long string. - # |<----------------------------------------------->| capacity - # |<------------>| |<-------->| string - # |<------------------->| gap - # |<------------>| prefix - # |<-------->| suffix - def __init__(self, initial_capacity: int = 16) -> None: - assert initial_capacity > 0 - # Initialize an empty gap buffer - self._buf = [""] * initial_capacity - self._gap_start = 0 - self._gap_end = initial_capacity -``` - -### Accessors - -I'm mostly adding these for exposition, and making it easier to write `assert`s -later. - -```python -@property -def capacity(self) -> int: - return len(self._buf) - -@property -def gap_length(self) -> int: - return self._gap_end - self._gap_start - -@property -def string_length(self) -> int: - return self.capacity - self.gap_length - -@property -def prefix_length(self) -> int: - return self._gap_start - -@property -def suffix_length(self) -> int: - return self.capacity - self._gap_end -``` - -### Growing the buffer - -I've written this method in a somewhat non-idiomatic manner, to make it closer -to how it would look in C using `realloc` instead. - -It would be more efficient to use slicing to insert the needed extra capacity -directly, instead of making a new buffer and copying characters over. - -```python -def grow(self, capacity: int) -> None: - assert capacity >= self.capacity - # Create a new buffer with the new capacity - new_buf = [""] * capacity - # Move the prefix/suffix to their place in the new buffer - added_capacity = capacity - len(self._buf) - new_buf[: self._gap_start] = self._buf[: self._gap_start] - new_buf[self._gap_end + added_capacity :] = self._buf[self._gap_end :] - # Use the new buffer, account for added capacity - self._buf = new_buf - self._gap_end += added_capacity -``` - -### Insertion - -Inserting text at the cursor's position means filling up the gap in the middle -of the buffer. To do so we must first make sure that the gap is big enough, or -grow the buffer accordingly. - -Then inserting the text is simply a matter of copying its characters in place, -and moving the start of the gap further right. - -```python -def insert(self, val: str) -> None: - # Ensure we have enouh space to insert the whole string - if len(val) > self.gap_length: - self.grow(max(self.capacity * 2, self.string_length + len(val))) - # Fill the gap with the given string - self._buf[self._gap_start : self._gap_start + len(val)] = val - self._gap_start += len(val) -``` - -### Deletion - -Removing text from the buffer simply expands the gap in the corresponding -direction, shortening the string's prefix/suffix. This makes it very cheap. - -The methods are named after the `backspace` and `delete` keys on the keyboard. - -```python -def backspace(self, dist: int = 1) -> None: - assert dist <= self.prefix_length - # Extend gap to the left - self._gap_start -= dist - -def delete(self, dist: int = 1) -> None: - assert dist <= self.suffix_length - # Extend gap to the right - self._gap_end += dist -``` - -### Moving the cursor - -Moving the cursor along the buffer will shift letters from one side of the gap -to the other, moving them accross from prefix to suffix and back. - -I find Python's list slicing not quite as elegant to read as a `memmove`, though -it does make for a very small and efficient implementation. - -```python -def left(self, dist: int = 1) -> None: - assert dist <= self.prefix_length - # Shift the needed number of characters from end of prefix to start of suffix - self._buf[self._gap_end - dist : self._gap_end] = self._buf[ - self._gap_start - dist : self._gap_start - ] - # Adjust indices accordingly - self._gap_start -= dist - self._gap_end -= dist - -def right(self, dist: int = 1) -> None: - assert dist <= self.suffix_length - # Shift the needed number of characters from start of suffix to end of prefix - self._buf[self._gap_start : self._gap_start + dist] = self._buf[ - self._gap_end : self._gap_end + dist - ] - # Adjust indices accordingly - self._gap_start += dist - self._gap_end += dist -```