RE Module Performance

Wed Jul 24 11:00:39 EDT 2013

On 07/24/2013 08:34 AM, Chris Angelico wrote:
> Frankly, Python's strings are a *terrible* internal representation
> for an editor widget - not because of PEP 393, but simply because
> they are immutable, and every keypress would result in a rebuilding
> of the string. On the flip side, I could quite plausibly imagine
> using a list of strings; whenever text gets inserted, the string gets
> split at that point, and a new string created for the insert (which
> also means that an Undo operation simply removes one entire string).
> In this usage, the FSR is beneficial, as it's possible to have
> different strings at different widths.

Very good point.  Seems like this is exactly what is tripping up jmf in
general.  His pseudo benchmarks are bogus for this exact reason. No one
uses python strings in this fashion.  Editors certainly would not.  But
then again his argument in the past does not mention editors.  But it
makes me wonder if jmf is using python strings appropriately, or even
realizes they are immutable.

> But mainly, I'm just wondering how many people here have any basis 
> from which to argue the point he's trying to make. I doubt most of
> us have (a) implemented an editor widget, or (b) tested multiple 
> different internal representations to learn the true pros and cons
> of each. 

Maybe, but simply thinking logically, FSR and UCS-4 are equivalent in
pros and cons, and the cons of using UCS-2 (the old narrow builds) are
well known.  UCS-2 simply cannot represent all of unicode correctly.
This is in the PEP of course.

His most recent argument that Python should use UTF as a representation
is very strange to be honest.  The cons of UTF are apparent and widely
known.  The main con is that UTF strings are O(n) for indexing a
position within the string.