HTMLParser problems.

Alex Martelli aleax at aleax.it
Sat Nov 1 05:42:11 EST 2003


Terry Reedy wrote:

>> >  Back in the day in pascal you could do stuff like
>> > "with self begin do_stuff(member_variable); end;"
>> > which was extremely useful for large 'records.'
> 
> There have been proposals something like that, but they do not seem to
> fit Python too well.

No, but, for the record: just last week in python-dev Guido rejected
a syntax proposal using a leading dot to strop a variablename in some
circumstances (writing '.var' rather than 'var' in those cases) for
the stated reason that, and I quote:
"I want to reserve .var for the "with" statement (a la VB)."

So, something like "with self: dostuff(.member_variable)" MIGHT be
in Python's future (the leading dot, like in VB, at least does make
things more explicit than leaving it implied like Pascal does).


>>     mv = self.member_variable
>>     do_stuff(mv)
> 
> In case you think this a hack, it is not.  Copying things into the
> local variable space (from builtins, globals, attributes) is a fairly
> common idiom.  When a value is used repeatedly (like in a loop), the
> copying is paid for by faster repeated access.

Sure, good point.  It _is_ a hack by some definitions of the word,
but that's not necessarily a bad thing.  A quibble: the optimization
may be worth it when the NAME is used repeatedly -- repeated uses
of the VALUE, e.g. within the body of do_stuff, accessing the value
through another name [e.g. the parametername for do_stuff] do not
count, because what you're optimizing is specifically name lookup.

E.g., one silly example:

[alex at lancelot bo]$ timeit.py -c -s'x=range(999)' 'for i in range(999):
x[i]=id(x[i])'
1000 loops, best of 3: 590 usec per loop

[alex at lancelot bo]$ timeit.py -c -s'x=range(999)' -s'lid=id' 'for i in
range(999): x[i]=lid(x[i])'
1000 loops, best of 3: 490 usec per loop

the repeated lookups of builtin name 'id' in the first case accounted
for almost 17% of the CPU time, so the simple optimization leading to
the second case may be worth it if this code is in a bottleneck.  In a
way, this is a special case of the general principle that Python does
NOT get into the thorny business of hosting constant subexpressions
(requiring it to prove that something IS constant...), so, when you're
looking at a major bottleneck, you have to consider doing such hoisting
yourself manually.  Name lookup for anything but local bare names IS
"a subexpression", so if you KNOW it's a constant subex. in some case
where every cycle matter, you can hoist it.

(Or, you can use psyco, which among many other things can do the
hoisting on your behalf...:

[alex at lancelot bo]$ timeit.py -c -s'x=range(999)' -s'import psyco;
psyco.full()' 'for i in range(999): x[i]=id(x[i])'
10000 loops, best of 3: 43 usec per loop

[alex at lancelot bo]$ timeit.py -c -s'x=range(999); lid=id' -s'import psyco;
psyco.full()' 'for i in range(999): x[i]=lid(x[i])'
10000 loops, best of 3: 43 usec per loop

as you can see, with psyco, this manual hoisting gives no further
benefit -- so you can use whatever construct you find clearer and
not worry about performance effects, just enjoying the order-of-
magnitude speedup that psyco achieves in this case either way:-).


Alex





More information about the Python-list mailing list