extracting numbers with decimal places from a string

MRAB python at mrabarnett.plus.com
Sun Jan 11 19:59:12 EST 2015


On 2015-01-12 00:04, Mark Lawrence wrote:
> On 11/01/2015 23:07, Thomas 'PointedEars' Lahn wrote:
>> Store Makhzan wrote:
>>
>>> I have this script which can calculate the total of numbers given in a
>>> string […]
>>> total = 0
>>> for c in '0123456789':
>>>     total += int(c)
>>> print total
>>>
>>> […]
>>> How should I modify this script to find the total of if the numbers given
>>> in the string form have decimal places? That is, how do I need to modify
>>> this line: […]
>>>
>>> for c in '1.32, 5.32, 4.4, 3.78':
>>>
>>> […] to find the total of these given numbers.
>>
>> The original script already does not do what it advertises.  Instead, it
>> iterates over the characters of the string, attempts to convert each to an
>> integer and then computes the sum.  That is _not_ “calculate the total of
>> numbers given in a string”.
>>
>> A solution has been presented, but it is not very pythonic because the
>> original code was not; that should have been
>>
>> ### Ahh, Gauß ;-)
>> print(sum(map(lambda x: int(x), list('0123456789'))))
>> ### --------------------------------------------------------------------
>>
>> Also, it cannot handle non-numeric strings well.  Consider this instead:
>>
>> ### --------------------------------------------------------------------
>> from re import findall
>>
>> s = '1.32, 5.32, 4.4, 3.78'
>> print(sum(map(lambda x: float(x), findall(r'-?\d+\.\d+', s))))
>> ### --------------------------------------------------------------------
>>
>> But if you are sure that except for the comma separator there are only
>> numeric strings, it is more efficient to use re.split() instead of
>> re.findall() here.
>>
>>
>> Aside:
>>
>> I thought I had more than a fair grasp of regular expressions, but I am
>> puzzled by
>>
>> | $ python3
>> | Python 3.4.2 (default, Dec 27 2014, 13:16:08)
>> | [GCC 4.9.2] on linux
>> | >>> from re import findall
>> | >>> s = '1.32, 5.32, 4.4, 3.78'
>> | >>> findall(r'-?\d+(\.\d+)?', s)
>> | ['.32', '.32', '.4', '.78']
>>
>> Why does this more flexible pattern not work as I expected in Python 3.x,
>> but virtually everywhere else?
>>
>> And why this?
>>
>> | >>> findall(r'-?\d+\.\d+', s)
>> | ['1.32', '5.32', '4.4', '3.78']
>> | >>> findall(r'-?\d+(\.\d+)', s)
>> | ['.32', '.32', '.4', '.78']
>>
>> Feature?  Bug?
>>
>
> I can't tell you as I avoid regexes like I avoid the plague.  Having
> said that I do know that there loads of old bugs on the bug tracker,
> many of which are fixed in the "new" regex module that's available here
> https://pypi.python.org/pypi/regex/
>
It's not a bug.

re.findall returns the capture groups, if present, or the entire match
if there are no capture groups.

In this instance, it's better to use a non-capture group:

findall(r'-?\d+(?:\.\d+)', s)

It's all in the docs! :-)



More information about the Python-list mailing list