RE Module Performance

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Jul 25 03:02:21 EDT 2013


On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote:

> But mainly, I'm just wondering how many people here have any basis from
> which to argue the point he's trying to make. I doubt most of us have
> (a) implemented an editor widget, or (b) tested multiple different
> internal representations to learn the true pros and cons of each. And
> even if any of us had, that still wouldn't have any bearing on PEP 393,
> which is about applications, not editor widgets. As stated above, Python
> strings before AND after PEP 393 are poor choices for an editor, ergo
> arguing from that standpoint is pretty useless.

That's a misleading way to put it. Using immutable strings as editor 
buffers might be a bad way to implement all but the most trivial, low-
performance (i.e. slow) editor, but the basic concept of PEP 393, picking 
an internal representation of the text based on its contents, is not. 
That's just normal. The only difference with PEP 393 is that the choice 
is made on the fly, at runtime, instead of decided in advance by the 
programmer.

I expect that the PEP 393 concept of optimizing memory per string buffer 
would work well in an editor. However the internal buffer is arranged, 
you can safely assume that each chunk of text (word, sentence, paragraph, 
buffer...) will very rarely shift from "all Latin 1" to "all BMP" to 
"includes SMP chars". So, for example, entering a SMP character will need 
to immediately up-cast the chunk from 1-byte per char to 4-bytes per 
char, which is relatively pricey, but it's a one-off cost. Down-casting 
when the SMP character is deleted doesn't need to be done immediately, it 
can be performed when the application is idle.

If the chunks are relatively small (say, a paragraph rather than multiple 
pages of text) then even that initial conversion will be invisible. A 
fast touch typist hits a key about every 0.1 of a second; if it takes a 
millisecond to convert the chunk, you wouldn't even notice the delay. You 
can copy and up-cast a lot of bytes in a millisecond.


-- 
Steven



More information about the Python-list mailing list