unicode() vs. s.decode()

Thu Aug 6 14:05:52 EDT 2009

* Michael Ströder (Thu, 06 Aug 2009 18:26:09 +0200)
> Thorsten Kampe wrote:
> > * Michael Ströder (Wed, 05 Aug 2009 16:43:09 +0200)
> > I don't think any measurable speed increase will be noticeable
> > between those two.
> 
> Well, seems not to be true. Try yourself. I did (my console has UTF-8 as charset):
> 
> Python 2.6 (r26:66714, Feb  3 2009, 20:52:03)
> [GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import timeit
> >>> timeit.Timer("'äöüÄÖÜß'.decode('utf-8')").timeit(1000000)
> 7.2721178531646729
> >>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000000)
> 7.1302499771118164
> >>> timeit.Timer("unicode('äöüÄÖÜß','utf8')").timeit(1000000)
> 8.3726329803466797
> >>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000000)
> 1.8622009754180908
> >>> timeit.Timer("unicode('äöüÄÖÜß','utf8')").timeit(1000000)
> 8.651669979095459
> >>>
> 
> Comparing again the two best combinations:
> 
> >>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(10000000)
> 17.23644495010376
> >>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(10000000)
> 72.087096929550171
> 
> That is significant! So the winner is:
> 
> unicode('äöüÄÖÜß','utf-8')

Unless you are planning to write a loop that decodes "äöüÄÖÜß" one 
million times, these benchmarks are meaningless.

Thorsten