Code that ought to run fast, but can't due to Python limitations.

John Nagle nagle at animats.com
Sat Jul 4 19:35:08 EDT 2009


Paul Rubin wrote:
> John Nagle <nagle at animats.com> writes:
>> Python doesn't have a "switch" or "case" statement, and when
>> you need a state machine with many states, that makes for painful,
>> slow code.  ...
>> There's a comment in the code that it would be useful
>> to run a few billion lines of HTML through an instrumented version
>> of the parser to decide in which order the IF statements should be
>> executed.  You shouldn't have to do that.
> 
> In that particular program it would probably be better to change those
> if/elif/elif/else constructs to dictionary lookups.  I see the program
> already does that for some large tables.

    A dictionary lookup (actually, several of them) for every
input character is rather expensive. Tokenizers usually index into
a table of character classes, then use the character class index in
a switch statement.

    This is an issue that comes up whenever you have to parse some formal
structure, from XML/HTML to Pickle to JPEG images to program source.

    If Python could figure out what's a constant and what isn't during
compilation, this sort of thing could be much more efficient.  In fact,
you don't even need a switch statement at the source level, if the
language is such that the compiler can figure out when "elif" clauses
are mutually exclusive.

    The temptation is to write tokenizers in C, but that's an admission
of language design failure.

    (A general problem with Python is "hidden dynamism".  That is,
changes to variables that can't be found by examining the source.
This is a killer for optimizations. One could take the position that any
module variable with exactly one visible assignment to it is, in fact,
only assigned in one place, and if the right hand side is a constant,
the variable is a constant.  This would break some programs doing funny
stuff with "eval", or using some of the backdoor ways to modify variables,
but that's very rare in practice.  In return, you get the ability to
hard-compile more of Python into fast code.  I'm thinking Shed Skin here,
not yet another attempt at a JIT system.)

    On the other hand, trying to do this in Perl, where you can't even index
strings, is far worse.
	
				John Nagle



More information about the Python-list mailing list