Getting values out of a CSV

Fri Jul 13 09:18:38 EDT 2007

On Fri, 13 Jul 2007 15:05:29 +0300, Daniel wrote:

>>> > Note that every time you see [x for x in ...] with no condition, you  
>>> can
>>> > write list(...) instead - more clear, and faster.
>>> >
>>> > data = list(csv.reader(open('some.csv', 'rb')))
>>>
>>> Faster? No. List Comprehensions are faster.
>>
>> kelvie at valour pdfps $ python -m timeit -c 'data = list(open("make.ps"))'
>> 100 loops, best of 3: 7.5 msec per loop
>> kelvie at valour pdfps $ python -m timeit -c 'data = [line for line in
>> open("make.ps")]'
>> 100 loops, best of 3: 9.2 msec per loop
>>
>> On my system just putting into a list is faster.  I think this is
>> because you don't need to assign each line to the variable 'line' each
>> time in the former case.
>>
>> I, too, think it's faster to just use list() instead of 'line for line
>> in iterable', as it seems kind of redundant.
>>
> 
> $ python -m timeit -c 'import csv; data = list(csv.reader(open("some.csv",  
> "rb")))'
> 10000 loops, best of 3: 44 usec per loop
> $ python -m timeit -c 'import csv; data = [row for row in  
> csv.reader(open("some.csv", "rb"))]'
> 10000 loops, best of 3: 37 usec per loop
> 
> I don't know why there seems to be a differece, but I know that list comps  
> are python are very heavily optimised.

Does the machine use power saving features like SpeedStep or
something similar, i.e. runs the processor always with 100% speed or is it
dynamically stepped if there's load on the processor?  Do both tests read
the data always from cache or has the very first loop had to fetch the CSV
file from disk?

$ python -m timeit -n 1000 -c 'import csv; data = [row for row in
csv.reader(open("test.csv", "rb"))]' 1000 loops, best of 3: 1.27 msec per
loop

$ python -m timeit -n 1000 -c 'import csv; data =
list(csv.reader(open("test.csv", "rb")))' 1000 loops, best of 3: 1.25 msec
per loop

Ciao,
	Marc 'BlackJack' Rintsch