extracting numbers with decimal places from a string

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Jan 11 20:13:21 EST 2015


Thomas 'PointedEars' Lahn wrote:

> The original script already does not do what it advertises.  Instead, it
> iterates over the characters of the string, attempts to convert each to an
> integer and then computes the sum.  That is _not_ “calculate the total of
> numbers given in a string”.

Yes, and the second piece of code the Original Poster provided is even
worse:

#Check if a perfect cube
total = 0
for c in ('1.23', '2.4', '3.123'):
   print float(c)
   total += float(c)
print total


I don't see how adding up some numbers checks whether it is a perfect cube.
I guess this is a good example of this:

    At Resolver we've found it useful to short-circuit any doubt 
    and just refer to comments in code as 'lies'.

http://import-that.dreamwidth.org/956.html



> A solution has been presented, but it is not very pythonic because the
> original code was not; that should have been
> 
> ### Ahh, Gauß ;-)
> print(sum(map(lambda x: int(x), list('0123456789'))))

That can be simplified to:

sum(map(int, '0123456789'))

which can then be passed to print() if required.



> Also, it cannot handle non-numeric strings well.  Consider this instead:

The OP hasn't specified whether or not he has to deal with non-numeric
strings, or how he wants to deal with them. But my guess is that he
actually doesn't want strings at all, and needs to be taught how to work
with lists of floats and/or ints.


> ### --------------------------------------------------------------------
> from re import findall
> 
> s = '1.32, 5.32, 4.4, 3.78'
> print(sum(map(lambda x: float(x), findall(r'-?\d+\.\d+', s))))
> ### --------------------------------------------------------------------

Consider this:

py> s = '123^%#@1.2abc, %#$@2.1&*%^'
py> print(sum(map(lambda x: float(x), findall(r'-?\d+\.\d+', s))))
3.3

If your aim is just to hide the fact that you have bad data, then the regex
solution "works". Many beginners think that their job as a programmer is to
stop the program from raising an exception no matter what. But I suggest
that Postel's Law:

    Be conservative in what you emit, and liberal in what you accept.

shouldn't apply here. I don't think any reasonable person would expect that
the string "123^%#@1.2abc" should be treated as 1.2.


> Aside:
> 
> I thought I had more than a fair grasp of regular expressions, but I am
> puzzled by
> 
> | $ python3
> | Python 3.4.2 (default, Dec 27 2014, 13:16:08)
> | [GCC 4.9.2] on linux
> | >>> from re import findall
> | >>> s = '1.32, 5.32, 4.4, 3.78'
> | >>> findall(r'-?\d+(\.\d+)?', s)
> | ['.32', '.32', '.4', '.78']
> 
> Why does this more flexible pattern not work as I expected in Python 3.x,
> but virtually everywhere else?

This is documented by findall:


py> help(findall)
Help on function findall in module re:

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more groups are present in the pattern, return a
    list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.



-- 
Steven




More information about the Python-list mailing list