f*cking re module

Diez B. Roggisch deets at web.de
Mon Jul 4 05:08:52 EDT 2005


jwaixs wrote:
> arg... I've lost 1.5 hours of my precious time to try letting re work
> correcty. There's really not a single good re tutorial or documentation
> I could found! There are only reference, and if you don't know how a
> module work you won't learn it from a reference!
> 
> This is the problem:
> 
> 
>>>>import re
>>>>str = "blabla<python>Re modules sucks!</python>blabla"
>>>>re.search("(<python>)(/python>)", str).group()
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: 'NoneType' object has no attribute 'group'
> 
> the only thing I want are the number of places blabla, Re modules
> sucks! and blabla are.

Others gave you advice on how to deal withe regexes. I'm going to add 
that regexes aren't the way to go for this - use HTMLParser. With your 
regex, you won't be able to handle correctly either this

<foo>some text</foo><foo>some other text</foo>

as you will get the whole string, not just the first match. You can 
alter the so-called longest match behaviour, but then

<foo>some oute text <foo>some inner text</foo> some more outer text</foo>


won't work....


Try and do not use regexps. Or at least do it in a way that you tokenize 
the text and then can sweep over it collecting the data you need 
yourself (but that's basically rewriting the html parsers out there).

Diez



More information about the Python-list mailing list