unicode() vs. s.decode()
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Fri Aug 7 03:25:44 EDT 2009
On Fri, 07 Aug 2009 08:04:51 +0100, Mark Lawrence wrote:
> I believe that the comment "these benchmarks are meaningless" refers to
> the length of the strings being used in the tests. Surely something
> involving thousands or millions of characters is more meaningful? Or to
> go the other way, you are unlikely to write for c in 'äöüÄÖÜß':
> u = unicode(c, 'utf-8')
> ...
> Yes?
There are all sorts of potential use-cases. A day or two ago, somebody
posted a question involving tens of thousands of lines of tens of
thousands of characters each (don't quote me, I'm going by memory). On
the other hand, it doesn't require much imagination to think of a use-
case where there are millions of lines each of a dozen or so characters,
and you want to process it line by line:
noun: cat
noun: dog
verb: café
...
As always, before optimizing, you should profile to be sure you are
actually optimizing and not wasting your time.
--
Steven
More information about the Python-list
mailing list