Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl)

Ville Vainio ville at spammers.com
Sat Apr 23 11:04:25 EDT 2005


>>>>> "Ilpo" == Ilpo Nyyssönen <iny> writes:

    >> so you picked the wrong file format for the task, and the slowest

    Ilpo> What would you recommend instead?

    Ilpo> I have searched alternatives, but somehow I still find XML
    Ilpo> the best there is. It is a standard format with standard
    Ilpo> programming API.

    Ilpo> I don't want to lose my calendar data. XML as a standard
    Ilpo> format makes it easier to convert later to some other
    Ilpo> format. As a textual format it is also readable as raw also
    Ilpo> and this eases debugging.

Use pickle, perhaps, for optimal speed and code non-ugliness. You can
always use xml as import/export format, perhaps even dumping the db to
xml at the end of each day.

    Ilpo> And my point is that the regular expression compilation can
    Ilpo> be a problem in python. The current regular expression
    Ilpo> engine is just unusable slow in short lived programs with a
    Ilpo> bit bigger amount of regexps. And fixing it should not be
    Ilpo> that hard: an easy improvement would be to add some kind of
    Ilpo> storing mechanism for the compiled regexps. Are there any
    Ilpo> reasons not to do this?

It should start life as a third-party module (perhaps written by you,
who knows :-). If it is deemed useful and clean enough, it could be
integrated w/ python proper. This is clearly something that should not
be in the python core, because the regexps themselves aren't there
either.

    >> python has shipped with a fast XML parser since 2.1, or so.

    Ilpo> With what features? validation? I really want a validating
    Ilpo> parser with a DOM interface. (Or something better than DOM,
    Ilpo> must be object oriented.)

Check out (coincidentally) Fredrik's elementtree:

http://effbot.org/zone/element-index.htm

    Ilpo> I don't want to make my programs ugly (read: use some more
    Ilpo> low level interface) and error prone (read: no validation)
    Ilpo> to make them fast.

Why don't you use external validation on the created xml? Validating
it every time sounds like way too much like Javaic B&D to be fun
anymore. Pickle should serve you well, and would probably remove about
half of your code. "Do the simplest thing that could possibly work"
and all that.

-- 
Ville Vainio   http://tinyurl.com/2prnb



More information about the Python-list mailing list