unicode() vs. s.decode()

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri Aug 7 03:25:44 EDT 2009


On Fri, 07 Aug 2009 08:04:51 +0100, Mark Lawrence wrote:

> I believe that the comment "these benchmarks are meaningless" refers to
> the length of the strings being used in the tests.  Surely something
> involving thousands or millions of characters is more meaningful? Or to
> go the other way, you are unlikely to write for c in 'äöüÄÖÜß':
>      u = unicode(c, 'utf-8')
>      ...
> Yes?

There are all sorts of potential use-cases. A day or two ago, somebody 
posted a question involving tens of thousands of lines of tens of 
thousands of characters each (don't quote me, I'm going by memory). On 
the other hand, it doesn't require much imagination to think of a use-
case where there are millions of lines each of a dozen or so characters, 
and you want to process it line by line:


noun: cat
noun: dog
verb: café
...


As always, before optimizing, you should profile to be sure you are 
actually optimizing and not wasting your time.



-- 
Steven



More information about the Python-list mailing list