RegEx issues

Sat Jan 24 13:24:52 EST 2009

"Sean Brown" <sbrown.home@[spammy] gmail.com> wrote in message 
news:glflaj$qrf$2 at nntp.motzarella.org...
> Using python 2.4.4 on OpenSolaris 2008.11
>
> I have the following string created by opening a url that has the
> following string in it:
>
> td[ct] = [[ ... ]];\r\n
>
> The ...  above is what I'm interested in extracting which is really a
> whole bunch of text. So I think the regex \[\[(.*)\]\]; should do it.
> The problem is it appears that python is escaping the \ in the regex
> because I see this:
>>>>reg = '\[\[(.*)\]\];'
>>>> reg
> '\\[\\[(.*)\\]\\];'
>
> Now to me looks like it would match the string - \[\[ ... \]\];

You are viewing the repr of the string

>>> reg='\[\[(.*)\]\];'
>>> reg
'\\[\\[(.*)\\]\\];'
>>> print reg
\[\[(.*)\]\];        <== these are the chars passed to regex

The slashes are telling regex the the [ are literal.

>
> Which obviously doesn't match anything because there are no literal \ in
> the above string. Leaving the \ out of the \[\[ above has re.compile
> throw an error because [ is a special regex character. Which is why it
> needs to be escaped in the first place.
>
> I am either doing something really wrong, which very possible, or I've
> missed something obvious. Either way, I thought I'd ask why this isn't
> working and why it seems to be changing my regex to something else.

Did you try it?

>>> s='td[ct] = [[blah blah]];\r\n'
>>> re.search(reg,s).group(1)
'blah blah'

-Mark