f*cking re module

Mon Jul 4 05:08:52 EDT 2005

jwaixs wrote:
> arg... I've lost 1.5 hours of my precious time to try letting re work
> correcty. There's really not a single good re tutorial or documentation
> I could found! There are only reference, and if you don't know how a
> module work you won't learn it from a reference!
> 
> This is the problem:
> 
> 
>>>>import re
>>>>str = "blabla<python>Re modules sucks!</python>blabla"
>>>>re.search("(<python>)(/python>)", str).group()
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> AttributeError: 'NoneType' object has no attribute 'group'
> 
> the only thing I want are the number of places blabla, Re modules
> sucks! and blabla are.

Others gave you advice on how to deal withe regexes. I'm going to add 
that regexes aren't the way to go for this - use HTMLParser. With your 
regex, you won't be able to handle correctly either this

<foo>some text</foo><foo>some other text</foo>

as you will get the whole string, not just the first match. You can 
alter the so-called longest match behaviour, but then

<foo>some oute text <foo>some inner text</foo> some more outer text</foo>

won't work....

Try and do not use regexps. Or at least do it in a way that you tokenize 
the text and then can sweep over it collecting the data you need 
yourself (but that's basically rewriting the html parsers out there).

Diez