RE Module Performance

wxjmfauth at gmail.com wxjmfauth at gmail.com
Wed Jul 24 09:40:55 EDT 2013


Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit :
> On 07/12/2013 09:59 AM, Joshua Landau wrote:
> 
> > If you're interested, the basic of it is that strings now use a
> 
> > variable number of bytes to encode their values depending on whether
> 
> > values outside of the ASCII range and some other range are used, as an
> 
> > optimisation.
> 
> 
> 
> Variable number of bytes is a problematic way to saying it.  UTF-8 is a
> 
> variable-number-of-bytes encoding scheme where each character can be 1,
> 
> 2, 4, or more bytes, depending on the unicode character.  As you can
> 
> imagine this sort of encoding scheme would be very slow to do slicing
> 
> with (looking up a character at a certain position).  Python uses
> 
> fixed-width encoding schemes, so they preserve the O(n) lookup speeds,
> 
> but python will use 1, 2, or 4 bytes per every character in the string,
> 
> depending on what is needed.  Just in case the OP might have
> 
> misunderstood what you are saying.
> 
> 
> 
> jmf sees the case where a string is promoted from one width to another,
> 
> and thinks that the brief slowdown in string operations to accomplish
> 
> this is a problem.  In reality I have never seen anyone use the types of
> 
> string operations his pseudo benchmarks use, and in general Python 3's
> 
> string behavior is pretty fast.  And apparently much more correct than
> 
> if jmf's ideas of unicode were implemented.

------

Sorry, you are not understanding Unicode. What is a Unicode
Transformation Format (UTF), what is the goal of a UTF and
why it is important for an implementation to work with a UTF.

Short example. Writing an editor with something like the
FSR is simply impossible (properly).

jmf



More information about the Python-list mailing list