blog/content/posts/2024-07-06-gap-buffer/index.md

5.9 KiB

title date draft description tags categories series favorite disable_feed
Gap Buffer 2024-07-06T21:27:19+01:00 false As featured in GNU Emacs
algorithms
data structures
python
programming
Cool algorithms
false false

The Gap Buffer is a popular data structure for text editors to represent files and editable buffers. The most famous of them probably being GNU Emacs.

What does it do?

A Gap Buffer is simply a list of characters, similar to a normal string, with the added twist of splitting it into two side: the prefix and suffix, on either side of the cursor. In between them, a gap is left to allow for quick insertion at the cursor.

Moving the cursor moves the gap around the buffer, the prefix and suffix getting shorter/longer as required.

Implementation

I'll be writing a sample implementation in Python, as with the rest of the [series]({{< ref "/series/cool-algorithms/">}}). I don't think it showcases the elegance of the Gap Buffer in action like a C implementation full of memmoves would, but it does makes it short and sweet.

Representation

We'll be representing the gap buffer as an actual list of characters.

Given that Python doesn't have characters, let's settle for a list of strings, each representing a single character...

Char = str

class GapBuffer:
    # List of characters, contains prefix and suffix of string with gap in the middle
    _buf: list[Char]
    # The gap is contained between [start, end) (i.e: buf[start:end])
    _gap_start: int
    _gap_end: int

    # Visual representation of the gap buffer:
    # This is a very  [                     ]long string.
    # |<----------------------------------------------->| capacity
    # |<------------>|                       |<-------->| string
    #                 |<------------------->|             gap
    # |<------------>|                                    prefix
    #                                        |<-------->| suffix
    def __init__(self, initial_capacity: int = 16) -> None:
        assert initial_capacity > 0
        # Initialize an empty gap buffer
        self._buf = [""] * initial_capacity
        self._gap_start = 0
        self._gap_end = initial_capacity

Accessors

I'm mostly adding these for exposition, and making it easier to write asserts later.

@property
def capacity(self) -> int:
  return len(self._buf)

@property
def gap_length(self) -> int:
  return self._gap_end - self._gap_start

@property
def string_length(self) -> int:
  return self.capacity - self.gap_length

@property
def prefix_length(self) -> int:
  return self._gap_start

@property
def suffix_length(self) -> int:
  return self.capacity - self._gap_end

Growing the buffer

I've written this method in a somewhat non-idiomatic manner, to make it closer to how it would look in C using realloc instead.

It would be more efficient to use slicing to insert the needed extra capacity directly, instead of making a new buffer and copying characters over.

def grow(self, capacity: int) -> None:
    assert capacity >= self.capacity
    # Create a new buffer with the new capacity
    new_buf = [""] * capacity
    # Move the prefix/suffix to their place in the new buffer
    added_capacity = capacity - len(self._buf)
    new_buf[: self._gap_start] = self._buf[: self._gap_start]
    new_buf[self._gap_end + added_capacity :] = self._buf[self._gap_end :]
    # Use the new buffer, account for added capacity
    self._buf = new_buf
    self._gap_end += added_capacity

Insertion

Inserting text at the cursor's position means filling up the gap in the middle of the buffer. To do so we must first make sure that the gap is big enough, or grow the buffer accordingly.

Then inserting the text is simply a matter of copying its characters in place, and moving the start of the gap further right.

def insert(self, val: str) -> None:
    # Ensure we have enouh space to insert the whole string
    if len(val) > self.gap_length:
        self.grow(max(self.capacity * 2, self.string_length + len(val)))
    # Fill the gap with the given string
    self._buf[self._gap_start : self._gap_start + len(val)] = val
    self._gap_start += len(val)

Deletion

Removing text from the buffer simply expands the gap in the corresponding direction, shortening the string's prefix/suffix. This makes it very cheap.

The methods are named after the backspace and delete keys on the keyboard.

def backspace(self, dist: int = 1) -> None:
    assert dist <= self.prefix_length
    # Extend gap to the left
    self._gap_start -= dist

def delete(self, dist: int = 1) -> None:
    assert dist <= self.suffix_length
    # Extend gap to the right
    self._gap_end += dist

Moving the cursor

Moving the cursor along the buffer will shift letters from one side of the gap to the other, moving them accross from prefix to suffix and back.

I find Python's list slicing not quite as elegant to read as a memmove, though it does make for a very small and efficient implementation.

def left(self, dist: int = 1) -> None:
    assert dist <= self.prefix_length
    # Shift the needed number of characters from end of prefix to start of suffix
    self._buf[self._gap_end - dist : self._gap_end] = self._buf[
        self._gap_start - dist : self._gap_start
    ]
    # Adjust indices accordingly
    self._gap_start -= dist
    self._gap_end -= dist

def right(self, dist: int = 1) -> None:
    assert dist <= self.suffix_length
    # Shift the needed number of characters from start of suffix to end of prefix
    self._buf[self._gap_start : self._gap_start + dist] = self._buf[
        self._gap_end : self._gap_end + dist
    ]
    # Adjust indices accordingly
    self._gap_start += dist
    self._gap_end += dist