finding out the number of rows in a CSV file [Resolved]

SimonPalmer simon.palmer at gmail.com
Wed Aug 27 08:28:45 EDT 2008


On Aug 27, 1:15 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Aug 27, 9:54 pm, SimonPalmer <simon.pal... at gmail.com> wrote:
>
>
>
> > On Aug 27, 12:50 pm, SimonPalmer <simon.pal... at gmail.com> wrote:
>
> > > On Aug 27, 12:41 pm, Jon Clements <jon... at googlemail.com> wrote:
>
> > > > On Aug 27, 12:29 pm, "Simon Brunning" <si... at brunningonline.net>
> > > > wrote:
>
> > > > > 2008/8/27 SimonPalmer <simon.pal... at gmail.com>:
>
> > > > > > anyone know how I would find out how many rows are in a csv file?
>
> > > > > > I can't find a method which does this on csv.reader.
>
> > > > > len(list(csv.reader(open('my.csv'))))
>
> > > > > --
> > > > > Cheers,
> > > > > Simon B.
> > > > > si... at brunningonline.nethttp://www.brunningonline.net/simon/blog/
>
> > > > Not the best of ideas if the row size or number of rows is large!
> > > > Manufacture a list, then discard to get its length -- ouch!
>
> > > Thanks to everyone for their suggestions.
>
> > > In my case the number of rows is never going to be that large (<200)
> > > so it is a practical if slightly inelegant solution
>
> > actually not resolved...
>
> > after reading the file throughthe csv.reader for the length I cannot
> > iterate over the rows.
>
> OK, I'll bite: Why do you think you need to know the number of rows in
> advance?
>
> > How do I reset the row iterator?
>
> You don't. You throw it away and get another one. You need to seek to
> the beginning of the file first. E.g.:
>
> C:\junk>type foo.csv
> blah,blah
> waffle
> q,w,e,r,t,y
>
> C:\junk>type csv2iters.py
> import csv
> f = open('foo.csv', 'rb')
> rdr = csv.reader(f)
> n = 0
> for row in rdr:
>    n += 1
> print n, f.tell()
> f.seek(0)
> rdr = csv.reader(f)
> for row in rdr:
>     print row
>
> C:\junk>csv2iters.py
> 3 32
> ['blah', 'blah']
> ['waffle']
> ['q', 'w', 'e', 'r', 't', 'y']
>
> HTH,
> John

this is all good, and thanks for your time.  I need the number of rows
because of the nature of the data and what I do with it on reading.  I
need to initialise some data structures and that is *much* more
efficient if I know in advance the number of rows of data.  The cost
of reading the file is probably less than incrementally extending my
internal structures because of their complexity.

To be honest these are all good solutions and I think I have a a view
of csv reading that comes form different technologies plus lack of
experience with python which just means that I don't know where to
look for answers.

Very happy that I can now proceed.



More information about the Python-list mailing list