How do I display unicode value stored in a string variable using ord()

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Aug 19 06:51:26 EDT 2012


On Sun, 19 Aug 2012 01:11:56 -0700, Paul Rubin wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
>>     result = text[end:]
> 
> if end not near the end of the original string, then this is O(N) even
> with fixed-width representation, because of the char copying.

Technically, yes. But it's a straight copy of a chunk of memory, which 
means it's fast: your OS and hardware tries to make straight memory 
copies as fast as possible. Big-Oh analysis frequently glosses over 
implementation details like that.

Of course, that assumption gets shaky when you start talking about extra 
large blocks, and it falls apart completely when your OS starts paging 
memory to disk.

But if it helps to avoid irrelevant technical details, change it to 
text[end:end+10] or something.


> if it is near the end, by knowing where the string data area ends, I
> think it should be possible to scan backwards from the end, recognizing
> what bytes can be the beginning of code points and counting off the
> appropriate number.  This is O(1) if "near the end" means "within a
> constant".

You know, I think you are misusing Big-Oh analysis here. It really 
wouldn't be helpful for me to say "Bubble Sort is O(1) if you only sort 
lists with a single item". Well, yes, that is absolutely true, but that's 
a special case that doesn't give you any insight into why using Bubble 
Sort as your general purpose sort routine is a terrible idea.

Using variable-sized strings like UTF-8 and UTF-16 for in-memory 
representations is a terrible idea because you can't assume that people 
will only every want to index the first or last character. On average, 
you need to scan half the string, one character at a time. In Big-Oh, we 
can ignore the factor of 1/2 and just say we scan the string, O(N).

That's why languages tend to use fixed character arrays for strings. 
Haskell is an exception, using linked lists which require traversing the 
string to jump to an index. The manual even warns:

[quote]
If you think of a Text value as an array of Char values (which it is 
not), you run the risk of writing inefficient code.

An idiom that is common in some languages is to find the numeric offset 
of a character or substring, then use that number to split or trim the 
searched string. With a Text value, this approach would require two O(n) 
operations: one to perform the search, and one to operate from wherever 
the search ended. 
[end quote]

http://hackage.haskell.org/packages/archive/text/0.11.2.2/doc/html/Data-Text.html



-- 
Steven



More information about the Python-list mailing list