extracting numbers with decimal places from a string

Thomas 'PointedEars' Lahn PointedEars at web.de
Sun Jan 11 20:00:19 EST 2015


Mark Lawrence wrote:

> On 11/01/2015 23:07, Thomas 'PointedEars' Lahn wrote:
>> I thought I had more than a fair grasp of regular expressions, but I am
>> puzzled by
>>
>> | $ python3
>> | Python 3.4.2 (default, Dec 27 2014, 13:16:08)
>> | [GCC 4.9.2] on linux
>> | >>> from re import findall
>> | >>> s = '1.32, 5.32, 4.4, 3.78'
>> | >>> findall(r'-?\d+(\.\d+)?', s)
>> | ['.32', '.32', '.4', '.78']
>>
>> Why does this more flexible pattern not work as I expected in Python 3.x,
>> but virtually everywhere else?
>>
>> And why this?
>>
>> | >>> findall(r'-?\d+\.\d+', s)
>> | ['1.32', '5.32', '4.4', '3.78']
>> | >>> findall(r'-?\d+(\.\d+)', s)
>> | ['.32', '.32', '.4', '.78']
>>
>> Feature?  Bug?
> 
> I can't tell you

I know now why I get this result.  It is a feature, not a bug:

<https://docs.python.org/3/library/re.html?highlight=findall#re.findall>
|
| re.findall(pattern, string, flags=0)
|   Return all non-overlapping matches of pattern in string, as a list of 
|   strings. The string is scanned left-to-right, and matches are returned 
|   in the order found. If one or more groups are present in the pattern,
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   return a list of groups; this will be a list of tuples if the pattern
    ^^^^^^^^^^^^^^^^^^^^^^^   
|   has more than one group. Empty matches are included in the result unless 
|   they touch the beginning of another match.

A solution is to use non-capturing parentheses:

| >>> findall(r'-?\d+(?:\.\d+)?', '1.32, 5.32, 4.4, 3.78')
| ['1.32', '5.32', '4.4', '3.78']

> as I avoid regexes like I avoid the plague.

You should reconsider.  Regular expressions are a powerful tool.

> Having said that I do know that there loads of old bugs on the bug 
> tracker, many of which are fixed in the "new" regex module that's 
> available here https://pypi.python.org/pypi/regex/

Interesting, thank you.

Please trim your quotes to the relevant minimum next time, though.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.



More information about the Python-list mailing list