Bottleneck? More efficient regular expression?

Robin Becker robin at jessikat.fsnet.co.uk
Fri Sep 26 03:44:46 EDT 2003


In article <%nNcb.3635$RW4.47 at newsread4.news.pas.earthlink.net>, Andrew
Dalke <adalke at mindspring.com> writes
>Tina Li:
>> The lag is *perceivable* (this is what I meant; sorry) by a human user so
>it's slower.
>
>Yup, that's what I meant.  Too many people make theoretical
>arguments for why to choose one (complicated) approach
>over a simpler one on the basis of performance, when it turns
>out performance isn't the issue.  My appreciation goes out to you
>for doing it the right way.
>
>You may also want to look at pyRXP from ReportLab.
>However, there seems to be some drastic problems on their
>site -- links on reportlab.com fail and reportlab.org goes
>to pair.com's site placeholder page.


yup we're reassembling everything again .... sigh :(

New more dynamic confusion ......

>
>It's a very fast XML parser for Python.
>
>> I in fact tried that before but the over-limit error still happened. So
>it's
>> not just the non-greedy .*? that's causing the problem. Hmm.
>
>No, I don't think it is.  The stack space increases by one for
>each ambiguity and the .*? should only produce one ambiguity.
>Usually there's a stack problem only if you have an ambiguity
>or empty match inside a repeat, and I didn't see that in your
>pattern.
>
>If you get really interested in tracking this down, you might look
>around for some of the GUI regexp debugging tools.  There's
>one in ActiveState's product, as I recall.  Err, but it's based on
>Perl's regexp parser and won't handle (?P<>)
>
>(I do have an experimental pure-Python regexp engine that
>I would offer for debugging, but it doesn't handle .*? yet and
>needs a rewrite before it does.)
>
>> It only handles tags without space because all tags are
>> guaranteed to be generated without space.
>
>Sure.  All I was saying was that if you're going to code for
>a specific layout then you don't need to be as general.
>
>You might even consider using "[^\n]*\n{5}" if you just
>want to skip 5 lines.
>
>                    Andrew
>                    dalke at dalkescientific.com
>P.S.
>  If you are doing anything open-sourceish, or using
>open source in bioinformatics, structural biology, and
>related fields, and will be at ISMB in Edinborough next
>year, you might consider attending the Bioinformatics
>Open Source Conference.
>
>

-- 
Robin Becker




More information about the Python-list mailing list