regex question

Sat Jun 25 14:20:45 EDT 2005

"Felix Schwarz" <felix.schwarz at web.de> wrote:

> Hi all,
>
> I'm experiencing problems with a regular expression and I can't figure
> out which words I use when googling. I read the python documentation for
> the re module multiple times now but still no idea what I'm doing wrong.
>
> What I want to do:
> - Extract all digits (\d) in a string.
> - Digits are separated by space (\w)
>
> What my program does:
> - It extracts only the last digit.
>
> Here is my program:
> import re
> line = ' 1 2 3'
> regex = '^' + '(?:\s+(\d))*' + '$'
> match = re.match(regex, line)
> print "lastindex is: ",match.lastindex
> print "matches: ",match.group(1)
>
>
> Obviously I do not understand how (?:\s+(\d))* works in conjunction with
>   ^ and $.
>
> Does anybody know how to transform this regex to get the result I want
> to have?
>
> fs

Here are three ways:

- If you your strings consist of only white space and single digits as
in your example, the simplest way is split():
>>> ' 1   2     3'.split()
['1', '2', '3']

- Otherwise use re.findall:
>>> import re
>>> digit = re.compile(r'\d')
>>> digit.findall('1   ab 34b 6')
['1', '3', '4', '6']

- Finally, for the special case you are searching for single characters
(such as digits), perhaps the fastest way is to use string.translate:

>>> import string
>>> allchars = string.maketrans('','')     # 2 empty strings
>>> nondigits = allchars.translate(allchars, string.digits)
>>> '1   ab 34 6'.translate(allchars, nondigits)
'1346'

Note that the result is a string of the matched characters, not a list;
you can simply turn it to list by list('1346').

Hope this helps,

George