re.match and non-alphanumeric characters

MRAB google at mrabarnett.plus.com
Sun Nov 16 12:01:43 EST 2008


On Nov 16, 4:33 pm, The Web President <mattia.land... at gmail.com>
wrote:
> Dear all,
>
> this is really driving me nuts and any help would be extremely
> appreciated.
>
> I have a string that contains some numeric data. I want to isolate
> these data using re.match, as follows.
>
> bogus = "IFC(35m)"
> data = re.match(r'(\d+)',bogus)
> print data.group(1)
>
> I would expect to have "35" printed out to screen, but instead I get
> an error that the regular expression did not match:
>
> Traceback (most recent call last):
>   File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
> line 20, in <module>
>     print data.group(1)
> AttributeError: 'NoneType' object has no attribute 'group'
>
> Note that the same holds if I look for "35" straight, instead of "\d
> +". If instead I look for "IFC" it works fine. That is, apparently
> re.match will match only up to the first non-alphanumeric character
> and ignore anything after a "(", "_", "[" and god knows what else.
>
> I am using Python 2.6 (r26:66721, latest stable version). Am I missing
> something very big and very important?

re.match() anchors the match at the start of the string. What you need
is re.search(). It's all in the documentation! :-)



More information about the Python-list mailing list