blog/content/posts/2024-07-06-gap-buffer/index.md

3.6 KiB

title date draft description tags categories series favorite disable_feed
Gap Buffer 2024-07-06T21:27:19+01:00 false As featured in GNU Emacs
algorithms
data structures
python
programming
Cool algorithms
false false

The Gap Buffer is a popular data structure for text editors to represent files and editable buffers. The most famous of them probably being emacs.

What does it do?

A Gap Buffer is simply a list of characters, similar to a normal string, but with the added twist of splitting it into two side: the prefix and suffix, on either side of the cursor. In between them, a gap is left to allow for quick insertion at the cursor.

Moving the cursor moves the gap around the buffer, the prefix and suffix getting shorter/longer as required.

Implementation

I'll be writing a sample implementation in Python, to keep with the rest of the series, This does not showcase the elegance of the Gap Buffer in action like a C implementation full of memmoves would.

Representation

We'll be representing the gap buffer as an actual list of characters.

Given that Python doesn't have characters, wel'll have to settle for a list of strings, each representing a single character...

Char = str

class GapBuffer:
    # List of characters, contains prefix and suffix of string with gap in the middle
    _buf: list[Char]
    # The gap is contained between [start, end) (i.e: buf[start:end])
    _gap_start: int
    _gap_end: int

    # Visual representation of the gap buffer:
    # This is a very  [                     ]long string.
    # |<----------------------------------------------->| capacity
    # |<------------>|                       |<-------->| string
    #                 |<------------------->|             gap
    # |<------------>|                                    prefix
    #                                        |<-------->| suffix
    def __init__(self, initial_capacity: int = 16) -> None:
        assert initial_capacity > 0
        # Initialize an empty gap buffer
        self._buf = [""] * initial_capacity
        self._gap_start = 0
        self._gap_end = initial_capacity

Accessors

I'm mostly adding these for exposition, and making it easier to write asserts later.

@property
def capacity(self) -> int:
  return len(self._buf)

@property
def gap_length(self) -> int:
  return self._gap_end - self._gap_start

@property
def string_length(self) -> int:
  return self.capacity - self.gap_length

@property
def prefix_length(self) -> int:
  return self._gap_start

@property
def suffix_length(self) -> int:
  return self.capacity - self._gap_end

Growing the buffer

I've written this method in a somewhat non-idiomatic manner, to make it closer to how it would look in C using realloc instead.

It would be more efficient to use slicing to insert the needed extra capacity directly, instead of making a new buffer and copying characters over.

def grow(self, capacity: int) -> None:
    assert capacity >= self.capacity
    # Create a new buffer with the new capacity
    new_buf = [""] * capacity
    # Move the prefix/suffix to their place in the new buffer
    added_capacity = capacity - len(self._buf)
    new_buf[: self._gap_start] = self._buf[: self._gap_start]
    new_buf[self._gap_end + added_capacity :] = self._buf[self._gap_end :]
    # Use the new buffer, account for added capacity
    self._buf = new_buf
    self._gap_end += added_capacity