different string representation (buffer gap)

Tue Feb 3 12:09:19 EST 2004

Hi all --

I'm contemplating the idea of writing a simple emacs-like editor in
python (for fun and the experience of doing so).  In reading through
Craig Finseth's "The Craft of Text Editing":

  http://www.finseth.com/~fin/craft/

, I've come across the "buffer gap" representation for the text data
of the buffer.  Very briefly, it keeps the unallocated memory of the
character array at the editing point, so that as long as there is
memory available, an insert/delete is very low (constant) cost.  Of
course, moving the editing point means copying some character data so
that the gap moves with you, but...

Anyway, I'm wondering what straightforward ways to leverage /
implement this representation in Python.  Ideally, it would be great
if one could use a BufferGap class in all the places you'ld use a
python string transparently, to use standard regular expressions, for
example.  Glancing quickly at the regexmodule.c, and its use of
PyString_Whatever, I'm not certain this is easy to do efficiently
(must one copy the buffer's contents into a Python-native string
before one can use something like a regular expression match on the
buffer's contents?).

Anyone have ideas / suggestions on how one would represent an editing
buffer in a way that would remain most (transparently) compatible with
the Python standard library string operations (and yet remain
efficient for editing)?  If one is embedding the interpreter in the
editor (or writing the editor in pure python), and using python for
editor extensibility, it seems desireable to keep complexity down for
extension writers , and to allow them to think of the buffer as a
string.

Thanks...