re.match and non-alphanumeric characters

Sun Nov 16 12:00:49 EST 2008

On Nov 16, 10:33 am, The Web President <mattia.land... at gmail.com>
wrote:
> Dear all,
>
> this is really driving me nuts and any help would be extremely
> appreciated.
>
> I have a string that contains some numeric data. I want to isolate
> these data using re.match, as follows.
>
> bogus = "IFC(35m)"
> data = re.match(r'(\d+)',bogus)
> print data.group(1)
>
> I would expect to have "35" printed out to screen, but instead I get
> an error that the regular expression did not match:
>
> Traceback (most recent call last):
>   File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
> line 20, in <module>
>     print data.group(1)
> AttributeError: 'NoneType' object has no attribute 'group'
>
> Note that the same holds if I look for "35" straight, instead of "\d
> +". If instead I look for "IFC" it works fine. That is, apparently
> re.match will match only up to the first non-alphanumeric character
> and ignore anything after a "(", "_", "[" and god knows what else.
>
> I am using Python 2.6 (r26:66721, latest stable version). Am I missing
> something very big and very important?

try re.search or re.findall
re.match is only at the beginning of a string
i almost never use it
>>> re.search('(\d+)', bogus).group()
'35'
>>> re.search('(\d+)', bogus).span()
(4, 6)