regexp

johnzenger at gmail.com johnzenger at gmail.com
Tue Dec 19 22:42:44 EST 2006


Not just Python, but every Regex engine works this way.  You want a ?
after your *, as in <--(.*?)--> if you want it to catch the first
available "-->".

At this point in your adventure, you might be wondering whether regular
expressions are more trouble than they are worth.  They are.  There are
two libraries you need to take a look at, and soon:  BeautifulSoup for
parsing HTML, and PyParsing for parsing everything else.  Take the time
you were planning to spend on deciphering regexes like
"(\d{1,3}\.){3}\d{1,3}" and spend it learning the basics of those
libraries instead -- you will not regret it.

On Dec 19, 4:39 pm, vertigo <s... at spam.pl> wrote:
> Hello
>
> Thanx for help, i have one more question:
>
> i noticed that while matching regexp python tries to match as wide as it's
> possible,
> for example:
> re.sub("<!--.*-->","",htmldata)
> would cut out everything before first "<!--" and last "-->" in the
> document.
> Can i force re to math as narrow as possible ?
> (to match first "<!--" with the first "-->" after the "<!--" and to repeat
> this procedure while mentioned pattern is still found) ?
> 
> Thanx




More information about the Python-list mailing list