unicode() vs. s.decode()

Thu Aug 6 15:17:30 EDT 2009

On Thu, 06 Aug 2009 20:05:52 +0200, Thorsten Kampe wrote:

> > That is significant! So the winner is:
> > 
> > unicode('äöüÄÖÜß','utf-8')
> 
> Unless you are planning to write a loop that decodes "äöüÄÖÜß" one
> million times, these benchmarks are meaningless.

What if you're writing a loop which takes one million different lines of 
text and decodes them once each?

>>> setup = 'L = ["abc"*(n%100) for n in xrange(1000000)]'
>>> t1 = timeit.Timer('for line in L: line.decode("utf-8")', setup)
>>> t2 = timeit.Timer('for line in L: unicode(line, "utf-8")', setup)
>>> t1.timeit(number=1)
5.6751680374145508
>>> t2.timeit(number=1)
2.6822888851165771

Seems like a pretty meaningful difference to me.

-- 
Steven