RE Module Performance

Michael Torrie torriem at gmail.com
Wed Jul 24 10:47:36 EDT 2013


On 07/24/2013 07:40 AM, wxjmfauth at gmail.com wrote:
> Sorry, you are not understanding Unicode. What is a Unicode
> Transformation Format (UTF), what is the goal of a UTF and
> why it is important for an implementation to work with a UTF.

Really?  Enlighten me.

Personally, I would never use UTF as a representation *in memory* for a
unicode string if it were up to me.  Why?  Because UTF characters are
not uniform in byte width so accessing positions within the string is
terribly slow and has to always be done by starting at the beginning of
the string.  That's at minimum O(n) compared to FSR's O(1).  Surely you
understand this.  Do you dispute this fact?

UTF is a great choice for interchange, though, and indeed that's what it
was designed for.

Are you calling for UTF to be adopted as the internal, in-memory
representation of unicode?  Or would you simply settle for UCS-4?
Please be clear here.  What are you saying?

> Short example. Writing an editor with something like the
> FSR is simply impossible (properly).

How? FSR is just an implementation detail.  It could be UCS-4 and it
would also work.





More information about the Python-list mailing list