Most efficient method to search text?
Michael Hudson
mwh at python.net
Thu Oct 17 07:14:09 EDT 2002
Tim Peters <tim.one at comcast.net> writes:
> Especially for purposes of building lexers, it might be useful if the re
> package could recognize when a DFA approach was sufficient and practical,
> and switch to a different scheme entirely then. Or it might not. Build
> code to try it both ways, and let us know how it turns out ...
Indeed, my code canes re when the wordlist gets long. Here's some
numbers (do_comp(n) builds a list of n words composed of ten random
ascii characters and searches the Python standard library for them
using first my code, then a re-based approach):
>>> import robin
>>> robin.do_comp(10)
2.275832057
0.317215919495
>>> robin.do_comp(100)
2.2694439888
0.980538964272
>>> robin.do_comp(1000)
2.36677598953
8.53031408787
>>> robin.do_comp(2000)
2.34253299236
16.9222760201
So roughly linear behaviour from re, n-independent behaviour from my
code. Isn't it nice when things do what you expect?
On the other hand, the "compiler" code I posted yesterday consumes
vast gobs of memory -- trying to compile a 10000 long list of random
words got the OOM killer into action. A version using lists seems
slightly better in this respect, though slower in execution.
Cheers,
M.
--
Have you considered downgrading your arrogance to a reasonable level?
-- Erik Naggum, comp.lang.lisp, to yet another C++-using troll
More information about the Python-list
mailing list