unicode() vs. s.decode()

Mark Lawrence breamoreboy at yahoo.co.uk
Fri Aug 7 03:04:51 EDT 2009


Michael Ströder wrote:
> Thorsten Kampe wrote:
>> * Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200)
>>>>>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(10000000)
>>> 17.23644495010376
>>>>>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(10000000)
>>> 72.087096929550171
>>>
>>> That is significant! So the winner is:
>>>
>>> unicode('äöüÄÖÜß','utf-8')
>> Unless you are planning to write a loop that decodes "äöüÄÖÜß" one 
>> million times, these benchmarks are meaningless.
> 
> Well, I can tell you I would not have posted this here and checked it if it
> would be meaningless for me. You don't have to read and answer this thread if
> it's meaningless to you.
> 
> Ciao, Michael.
I believe that the comment "these benchmarks are meaningless" refers to 
the length of the strings being used in the tests.  Surely something 
involving thousands or millions of characters is more meaningful? Or to 
go the other way, you are unlikely to write
for c in 'äöüÄÖÜß':
     u = unicode(c, 'utf-8')
     ...
Yes?

-- 
Kindest regards.

Mark Lawrence.




More information about the Python-list mailing list