unicode() vs. s.decode()

Thu Aug 6 10:15:41 EDT 2009

On Thu, 2009-08-06 at 01:31 +0000, John Machin wrote:
> Faster by an enormous margin; attributing this to the cost of attribute lookup
> seems implausible.

Ok, fair point.  I don't think the time difference fully registered when
I composed that message.

Testing a global access (LOAD_GLOBAL) versus an attribute access on a
global object (LOAD_GLOBAL + LOAD_ATTR) shows that the latter is about
40% slower than the former.  So that certainly doesn't account for the
difference.

> Suggested further avenues of investigation:
> 
> (1) Try the timing again with "cp1252" and "utf8" and "utf_8"
> 
> (2) grep "utf-8" <Python2.X_source_code>/Objects/unicodeobject.c

Very pedagogical of you. :)  Indeed, it looks like bigger player in the
performance difference is the fact that the code path for unicode(s,
enc) short-circuits the codec registry for common encodings (which
includes 'utf-8' specifically), whereas s.decode('utf-8') necessarily
consults the codec registry.

Cheers,
Jason.