extracting numbers with decimal places from a string
Thomas 'PointedEars' Lahn
PointedEars at web.de
Sun Jan 11 20:00:19 EST 2015
Mark Lawrence wrote:
> On 11/01/2015 23:07, Thomas 'PointedEars' Lahn wrote:
>> I thought I had more than a fair grasp of regular expressions, but I am
>> puzzled by
>>
>> | $ python3
>> | Python 3.4.2 (default, Dec 27 2014, 13:16:08)
>> | [GCC 4.9.2] on linux
>> | >>> from re import findall
>> | >>> s = '1.32, 5.32, 4.4, 3.78'
>> | >>> findall(r'-?\d+(\.\d+)?', s)
>> | ['.32', '.32', '.4', '.78']
>>
>> Why does this more flexible pattern not work as I expected in Python 3.x,
>> but virtually everywhere else?
>>
>> And why this?
>>
>> | >>> findall(r'-?\d+\.\d+', s)
>> | ['1.32', '5.32', '4.4', '3.78']
>> | >>> findall(r'-?\d+(\.\d+)', s)
>> | ['.32', '.32', '.4', '.78']
>>
>> Feature? Bug?
>
> I can't tell you
I know now why I get this result. It is a feature, not a bug:
<https://docs.python.org/3/library/re.html?highlight=findall#re.findall>
|
| re.findall(pattern, string, flags=0)
| Return all non-overlapping matches of pattern in string, as a list of
| strings. The string is scanned left-to-right, and matches are returned
| in the order found. If one or more groups are present in the pattern,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| return a list of groups; this will be a list of tuples if the pattern
^^^^^^^^^^^^^^^^^^^^^^^
| has more than one group. Empty matches are included in the result unless
| they touch the beginning of another match.
A solution is to use non-capturing parentheses:
| >>> findall(r'-?\d+(?:\.\d+)?', '1.32, 5.32, 4.4, 3.78')
| ['1.32', '5.32', '4.4', '3.78']
> as I avoid regexes like I avoid the plague.
You should reconsider. Regular expressions are a powerful tool.
> Having said that I do know that there loads of old bugs on the bug
> tracker, many of which are fixed in the "new" regex module that's
> available here https://pypi.python.org/pypi/regex/
Interesting, thank you.
Please trim your quotes to the relevant minimum next time, though.
--
PointedEars
Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
More information about the Python-list
mailing list