[Patches] [ python-Patches-1624059 ] fast subclasses of builtin types

SourceForge.net noreply at sourceforge.net
Sun Feb 25 20:50:53 CET 2007


Patches item #1624059, was opened at 2006-12-28 22:01
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Guido van Rossum (gvanrossum)
Summary: fast subclasses of builtin types

Initial Comment:
This is similar to a patch posted on python-dev a few months ago (or more).  I modified it to also handle subclassing exceptions which should speed up exception handling a bit.  (This was proposed by Guido based on the original patch.)  I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-02-25 11:50

Message:
Logged In: YES 
user_id=33168
Originator: YES

Committed rev 53911.

Hopefully the checkin comment explains most of what's going on.  I
simplified the patch as much as possible.  I like to start with less code. 
If we can improve the speed, that can be optimized later.  I didn't measure
the little variaions.  I had measured that it made a real diff in speed for
using an int subclass a long time ago.  

This should help a fair amount for exceptions too.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-01-06 06:54

Message:
Logged In: YES 
user_id=21627
Originator: NO

File Added: a.c

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2007-01-06 06:24

Message:
Logged In: YES 
user_id=21627
Originator: NO

I made a couple of assembler experiments (see attached a.c), with gcc 4.1
on x86.

A "bit mask enumeration" test (f) compiles into four instructions:

        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L18
(fall-through being the else case)

A single bit test of a flag (g) compiles to two instructions:

        testl   $-1073741824, 8(%eax)
        je      .L9
(fall-through being the if case)

Adding an identity test (comparison with the address of a global),
followed by a bit mask test (h), compiles into six instructions:

        cmpl    $int_type, %eax
        je      .L2
        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L2
(fall-through being the else case)

In the common case, only two of these instructions are executed.

So all-in-all, I would agree with Guido that adding bit flags is more
efficient. However, existing bits cannot be recycled: in existing
binary extension modules, these flags are set, so if the modules don't
get recompiled, the type check would believe that the types are 
subtypes.



----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-03 19:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

This looks fine, but I have some questions about alternative
implementations:

- Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact()
before calling the fast bit checking macro?  Did you measure that this is
in fact faster?  True, it means always a pointer deref, so maybe it is --
but OTOH it is more instructions.

- Why not have a separate bit for each type?  Then you could make the fast
macro test for (flags & mask) != 0 instead of testing for (flag & mask) ==
value.  It would use up all the remaining bits, but I suspect there are
some unused (or reusable) bits in lower positions: 1L<<2 is unused (was
GC), and 1L<<11 also seems unused.  And bits 18 through 23!  And I'm
guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they
were introduced in 2.0...  So it really looks like you have plenty of bits.
 Of course I don't know if it matters; would be worth it perhaps to look at
the machine code.

- Oops, it looks like your comment is off.  You claim to be using bits
24-27, leaving 28-31 free, but in fact you're using bits 28-31!

BTW You're inroducing quite a few lines over 80 chars.  Perhaps cut back a
bit?


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 22:04

Message:
Logged In: YES 
user_id=33168
Originator: YES

I forgot to mention this patch works by using unused bits in tp_flags. 
This saves a function call when checking for a subclass of a builtin type.

There's one funky thing about this patch, the change to
Objects/exceptions.c.  I didn't investigate why this was necessary, or more
likely I did why when I added it and forgot.  I know that without adding
BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470


More information about the Patches mailing list