Convert '165.0' to int

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Jul 25 05:48:38 EDT 2011


On Mon, 25 Jul 2011 10:07 am Billy Mays wrote:

> On 7/24/2011 2:27 PM, SigmundV wrote:

>> list_of_integers = map(string_to_int, list_of_strings)
>>
>> Of course, this will be horribly slow if you have thousands of
>> strings. In such a case you should use an iterator (assuming you use
>> python 2.7):
>>
>> import itertools as it
>> iterator = it.imap(string_to_int, list_of_strings)

 
> if the goal is speed, then you should use generator expressions:
> 
> list_of_integers = (int(float(s)) for s in list_of_strings)


I'm not intending to pick on Billy or Sigmund here, but for the beginners
out there, there are a lot of myths about the relative speed of map, list
comprehensions, generator expressions, etc.

The usual optimization rules apply:

    We should forget about small efficiencies, say about 97% of 
    the time: premature optimization is the root of all evil.
    -- Donald Knuth

    More computing sins are committed in the name of efficiency 
    (without necessarily achieving it) than for any other single 
    reason - including blind stupidity. -- W.A. Wulf

and of course:

    If you haven't measured it, you're only guessing whether it is 
    faster or slower. 

(And unless you're named Raymond Hettinger, I give little or no credibility
to your guesses except for the most obvious cases. *wink*)

Generators (including itertools.imap) include some overhead which list
comprehensions don't have (at least in some versions of Python). So for
small sets of data, creating the generator may be more time consuming than
evaluating the generator all the way through.

For large sets of data, that overhead is insignificant, but in *total*
generators aren't any faster than creating the list up front. They can't
be. They end up doing the same amount of work: if you have to process one
million strings, then whether you use a list comp or a gen expression, you
still end up processing one million strings. The only advantage to the
generator expression (and it is a HUGE advantage, don't get me wrong!) is
that you can do the processing lazily, on demand, rather than all up front,
possibly bailing out early if necessary.

But if you end up pre-processing the entire data set, there is no advantage
to using a gen expression rather than a list comp, or map. So which is
faster depends on how you end up using the data.

One other important proviso: if your map function is a wrapper around a
Python expression:

map(lambda x: x+1, data)
[x+1 for x in data]

then the list comp will be much faster, due to the overhead of the function
call. List comps and gen exprs can inline the expression x+1, performing it
in fast C rather than slow Python.

But if you're calling a function in both cases:

map(int, data)
[int(x) for x in data]

then the overhead of the function call is identical for both the map and the
list comp, and they should be equally as fast. Or slow, as the case may be.

But don't take my word on this! Measure, measure, measure! Performance is
subject to change without notice. I could be mistaken.

(And don't forget that everything changes in Python 3. Whatever you think
you know about speed in Python 2, it will be different in Python 3.
Generator expressions become more efficient; itertools.imap disappears; the
built-in map becomes a lazy generator rather than returning a list.)


-- 
Steven




More information about the Python-list mailing list