f*cking re module
Diez B. Roggisch
deets at web.de
Mon Jul 4 05:08:52 EDT 2005
jwaixs wrote:
> arg... I've lost 1.5 hours of my precious time to try letting re work
> correcty. There's really not a single good re tutorial or documentation
> I could found! There are only reference, and if you don't know how a
> module work you won't learn it from a reference!
>
> This is the problem:
>
>
>>>>import re
>>>>str = "blabla<python>Re modules sucks!</python>blabla"
>>>>re.search("(<python>)(/python>)", str).group()
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AttributeError: 'NoneType' object has no attribute 'group'
>
> the only thing I want are the number of places blabla, Re modules
> sucks! and blabla are.
Others gave you advice on how to deal withe regexes. I'm going to add
that regexes aren't the way to go for this - use HTMLParser. With your
regex, you won't be able to handle correctly either this
<foo>some text</foo><foo>some other text</foo>
as you will get the whole string, not just the first match. You can
alter the so-called longest match behaviour, but then
<foo>some oute text <foo>some inner text</foo> some more outer text</foo>
won't work....
Try and do not use regexps. Or at least do it in a way that you tokenize
the text and then can sweep over it collecting the data you need
yourself (but that's basically rewriting the html parsers out there).
Diez
More information about the Python-list
mailing list