Python and regexp efficiency.. again.. :)

M.-A. Lemburg mal at lemburg.com
Mon Dec 13 08:59:33 EST 1999


Markus Stenberg wrote:
> 
> "M.-A. Lemburg" <mal at lemburg.com> writes:
> >> .. snipped my own comment
> > Hmm, have you tried mxTextTools ? As an example, the HTML parser
> > provided as example can easily handle 900kB HTML/sec. on a
> > K6/266 machine.
> 
> Yes; but the tagging language seemed to be somewhat limited compared to
> regexps. Hmm, wonder if it's possible to write regexp->tag definition
> converter.. :)

I would argue that it can do far more than regular regexp engines:
the fact that makes me think so is that you can code your own
matching functions and place them right next to the fast low-level
builtin ones. Also, feature like callbacks, etc. provide a pretty
wide range of applications, e.g. your can write event driven
parsers just as well as DOM style ones.

Note that you can even add regexp matching functions to the
Tagging Engine.
 
> > There are some nice tools available to help build the needed
> > Tagging Tables. More infos are available on my Python Pages,
> > including pointers to those tools.
> 
> Meta-language link didn't work and EBNF is bit too low-level for my liking
> - rewriting 150+ and growing rapidly definitions of "interesting" log lines
> in EBNF is not my idea of fun :P. Of course, I could just do some really
> ugly m4 hacking to do that, but I'd prefer to avoid that.

Note that the package has tag-table writer which handles 
this case (you may have to adapt it probably, but the basic
idea should be clear):

def word_in_list(l):

    """ Creates a lookup table that matches the words in l 
    """
    ...

(You can find it in TextTools.py)

Given that your parsing requirements are rather simple this
approach should be ideal for your case.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    18 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/






More information about the Python-list mailing list