Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl)
Ville Vainio
ville at spammers.com
Sat Apr 23 11:04:25 EDT 2005
>>>>> "Ilpo" == Ilpo Nyyssönen <iny> writes:
>> so you picked the wrong file format for the task, and the slowest
Ilpo> What would you recommend instead?
Ilpo> I have searched alternatives, but somehow I still find XML
Ilpo> the best there is. It is a standard format with standard
Ilpo> programming API.
Ilpo> I don't want to lose my calendar data. XML as a standard
Ilpo> format makes it easier to convert later to some other
Ilpo> format. As a textual format it is also readable as raw also
Ilpo> and this eases debugging.
Use pickle, perhaps, for optimal speed and code non-ugliness. You can
always use xml as import/export format, perhaps even dumping the db to
xml at the end of each day.
Ilpo> And my point is that the regular expression compilation can
Ilpo> be a problem in python. The current regular expression
Ilpo> engine is just unusable slow in short lived programs with a
Ilpo> bit bigger amount of regexps. And fixing it should not be
Ilpo> that hard: an easy improvement would be to add some kind of
Ilpo> storing mechanism for the compiled regexps. Are there any
Ilpo> reasons not to do this?
It should start life as a third-party module (perhaps written by you,
who knows :-). If it is deemed useful and clean enough, it could be
integrated w/ python proper. This is clearly something that should not
be in the python core, because the regexps themselves aren't there
either.
>> python has shipped with a fast XML parser since 2.1, or so.
Ilpo> With what features? validation? I really want a validating
Ilpo> parser with a DOM interface. (Or something better than DOM,
Ilpo> must be object oriented.)
Check out (coincidentally) Fredrik's elementtree:
http://effbot.org/zone/element-index.htm
Ilpo> I don't want to make my programs ugly (read: use some more
Ilpo> low level interface) and error prone (read: no validation)
Ilpo> to make them fast.
Why don't you use external validation on the created xml? Validating
it every time sounds like way too much like Javaic B&D to be fun
anymore. Pickle should serve you well, and would probably remove about
half of your code. "Do the simplest thing that could possibly work"
and all that.
--
Ville Vainio http://tinyurl.com/2prnb
More information about the Python-list
mailing list