flaming vs accuracy [was Re: Performance of int/long in Python 3]

Ian Kelly ian.g.kelly at gmail.com
Thu Mar 28 12:33:46 EDT 2013


On Thu, Mar 28, 2013 at 7:34 AM, jmfauth <wxjmfauth at gmail.com> wrote:
> The flexible string representation takes the problem from the
> other side, it attempts to work with the characters by using
> their representations and it (can only) fails...

This is false.  As I've pointed out to you before, the FSR does not
divide characters up by representation.  It divides them up by
codepoint -- more specifically, by the *bit-width* of the codepoint.
We call the internal format of the string "ASCII" or "Latin-1" or
"UCS-2" for conciseness and a point of reference, but fundamentally
all of the FSR formats are simply byte arrays of *codepoints* -- you
know, those things you keep harping on.  The major optimization
performed by the FSR is to consistently truncate the leading zero
bytes from each codepoint when it is possible to do so safely.  But
regardless of to what extent this truncation is applied, the string is
*always* internally just an array of codepoints, and the same
algorithms apply for all representations.



More information about the Python-list mailing list