different string representation (buffer gap)

Neil Hodgson nhodgson at bigpond.net.au
Thu Feb 5 07:28:59 EST 2004


manders2k:

> I'm not sure how much of a
> performance bottleneck having this very low-level component written in
> python will be on modern machines; probably not such a big deal.

   The performance bottleneck in split buffers is often the cost of copying
array ranges. I once wrote a patch for Python's array class to provide
copying within an array but the patch contents didn't make it to SourceForge
and I haven't had time to follow it up.

http://mail.python.org/pipermail/patches/2003-April/012043.html

> Writing a buffer class and fiddling with pointers and whatnot actually
> sounds easier to do in C++ than in emulating this style of thing in
> Python (then again, I'm a heck of a lot more comfortable with C++ than
> Python at this point, so that might not speak to the difficulty of the
> task).

   Split buffers don't need to use pointers. I have written several split
buffer implementations including

* the implementation in Scintilla (scintilla/src/CellBuffer.[h,cxx])
http://cvs.sourceforge.net/viewcvs.py/scintilla/scintilla/

* a templated C++ implementation
http://mailman.lyra.org/pipermail/scintilla-interest/2002-March/000903.html

* a generic implementation that is part of my SinkWorld project written in a
subset of C++ that can be automatically translated into Java or C#
http://cvs.sourceforge.net/viewcvs.py/scintilla/sinkworld/

   Also in SinkWorld is a split buffer based data structure for partitioning
a document into segments such as lines called lv which is in lv.h. While the
line starts could be stored in a standard split buffer, inserting text would
then lead to adding to all following line start positions. To fix this,
there is also a 'step', with all positions after the step position adding
the step value to their values. The step is moved to the position where text
is being inserted or deleted but due to locality of modification, the move
is mostly short.

> What I guess I wish were the case is that I could implement the
> "string interface" on my BufferGap, so that everywhere that Python (at
> the C API level) expects a string, a BufferGap could be used instead.

   IIRC, at one stage there was explicit support in Python (perhaps in the
buffer class) for multiple segment buffers but it was never used so has
probably rotted.

> That way, all the libraries that inspect and operate on strings would
> work transparently, without having to be recoded (copy / paste, end up
> with a lot of mostly identical, redundant code) to operate on this
> other string representation.

   I'd like to see this implemented and have been meaning to look into it
myself.

   Neil





More information about the Python-list mailing list