Separate Rows in reader

Tim Chase python.list at tim.thechases.com
Sun Mar 24 14:28:44 EDT 2013


On 2013-03-24 08:57, rusi wrote:
> On Mar 24, 6:49 pm, Tim Chase <python.l... at tim.thechases.com> wrote:
> After doing:
> 
> >>> import csv
> >>> original = file('friends.csv', 'rU')
> >>> reader = csv.reader(original, delimiter='\t')
> 
> 
> Stripping of the first line is:
> >>> list(reader)[1:]
> >>> [tuple(row) for row in list(reader)[1:]]
> >>> map(tuple,list(reader)[1:])

This works for small sources, but slurps all the data into memory.
Because csv.reader is an iterator/generator, it can process huge CSV
files that wouldn't otherwise fit in memory.  By using either
r.next() (or "next(r)" in newer versions), it fetches one record from
the generator, to be discarded/stored as appropriate.


> Then you can of course make your code more performant thus:
> >>> reader.next()
> >>> (tuple(row) for row in reader)
> 
> In the majority of cases this optimization is not worth it

If the CSV file is large, using the iterator version is usually worth
the small performance penalty, as you don't have to keep the whole
file in memory.  As somebody who regularly deals with 0.5-1GB CSV
files from cellular providers, I speak from experience of having my
machine choke when reading the whole thing in.

> In any case, strewing prints all over the code is a bad habit
> (except for debugging).

Sorry if my print-statements were misinterpreted--I meant them as a
"do what you want with the data here" stand-in (thus the ellipsis).

-tkc






More information about the Python-list mailing list