How do I display unicode value stored in a string variable using ord()

Neil Hodgson nhodgson at iinet.net.au
Tue Aug 21 03:03:33 EDT 2012


Steven D'Aprano:

> Using variable-sized strings like UTF-8 and UTF-16 for in-memory
> representations is a terrible idea because you can't assume that people
> will only every want to index the first or last character. On average,
> you need to scan half the string, one character at a time. In Big-Oh, we
> can ignore the factor of 1/2 and just say we scan the string, O(N).

    In the majority of cases you can remove excessive scanning by 
caching the most recent index->offset result. If the next index request 
is nearer the cached index than to the beginning then iterate from that 
offset. This converts many operations from quadratic to linear. Locality 
of reference is common and can often be reasonably exploited.

    However, exposing the variable length nature of UTF-8 allows the 
application to choose efficient techniques for more cases.

    Neil



More information about the Python-list mailing list