[Python-Dev] Split unicodeobject.c into subfiles

Thu Oct 25 11:18:53 CEST 2012

On Thu, Oct 25, 2012 at 8:57 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 25.10.2012 08:42, Nick Coghlan wrote:
>>> Why are any of these codecs here in unicodeobjectland in the first
>>> place?  Sure, they're needed so that Python can find its own stuff,
>>> but in principle *any* codec could be needed.  Is it just an heuristic
>>> that the codecs needed for 99% of the world are here, and other codecs
>>> live in separate modules?
>>
>> I believe it's a combination of history and whether or not they're
>> needed by the interpreter during the bootstrapping process before the
>> encodings namespace is importable.
>
> They are in unicodeobject.c so that the compilers can inline the
> code in the various other places where they are used in the Unicode
> implementation directly as necessary and because the codecs use
> a lot of functions from the Unicode API (obviously), so the other
> direction of inlining (Unicode API in codecs) is needed as well.

I'm sorry to interrupt, but have you actually measured? What effect
the lack of said inlining has on *any* benchmark is definitely beyond
my ability to guess and I suspect is beyond the ability to guess of
anyone else on this list.

I challenge you to find a benchmark that is being significantly
affected (>15%) with the split proposed by Victor. It does not even
have to be a real-world one, although that would definitely buy it
more credibility.

Cheers,
fijal