Code that ought to run fast, but can't due to Python limitations.

John Nagle nagle at animats.com
Tue Jul 7 02:21:02 EDT 2009


Steven D'Aprano wrote:
> On Sun, 05 Jul 2009 01:58:13 -0700, Paul Rubin wrote:
> 
>> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
>>> Okay, we get it. Parsing HTML 5 is a bitch. What's your point? I don't
>>> see how a case statement would help you here: you're not dispatching on
>>> a value, but running through a series of tests until one passes.
>> A case statement switch(x):... into a bunch of constant case labels
>> would be able to use x as an index into a jump vector, and/or do an
>> unrolled logarithmic (bisection-like) search through the tests, instead
>> of a linear search.
> 
> Yes, I'm aware of that, but that's not what John's code is doing -- he's 
> doing a series of if expr ... elif expr tests. I don't think a case 
> statement can do much to optimize that.

    (I didn't write that code; it's from "http://code.google.com/p/html5lib/",
which is a general purpose HTML 5 parser written in Python.  It's compatible
with ElementTree and/or BeautifulSoup.  I currently use a modified
BeautifulSoup for parsing real-world HTML in a small-scale crawler, and
I'm looking at this as an HTML 5 compatible replacement.)

					John Nagle



More information about the Python-list mailing list