Where did csv.parser() go?

Peter Otten __peter__ at web.de
Tue Jan 2 12:43:30 EST 2018


jason at apkudo.com wrote:

> I need record the starting offsets of csv rows in a database for fast
> seeking later. Unfortunately, using any csv.reader() (or DictReader) tries
> to cache, which means: example_Data = "'data
> 0123456789ABCDE
> 1123456789ABCDE
> 2123456789ABCDE
> 3123456789ABCDE
> ...
> '''
> 
> for line in reader:
>     offsets[row] = f.tell()
> 
> is not possible. With my example data , offsets are reported as [0, 260,
> 260, 260...] they should be [0x00, 0x00,0x15, 0x25, ...] (sample data is
> 16 byte rows after a 5 byte header (just for now))
> 
> I saw in one of PEP-305's references a mention of csv.parser() which won't
> return a row until parsing is complete. This is ideal since some lines
> will have quoted text containing commas and new lines.  I don't want to
> re-write the parser, since later usage will use csvDictReader, so we need
> to identically parse rows. How can I do that with the Python 2.7 csv
> module?
> 
> Or how can I accomplish this task through other means?

It's not the reader that performs the caching it's iteration over the file:

$ python -c 'f = open("tmp.csv")
> for line in f: print f.tell()
> '
73
73
73
73
73
73

You can work around that by using the file's readline() method:

$ python -c 'f = open("tmp.csv")
for line in iter(f.readline, ""): print f.tell()
'
5
21
37
53
69
73

Combined with csv.reader():

$ python -c 'import csv; f = open("tmp.csv")
for row in csv.reader(iter(f.readline, "")): print f.tell(), row
'
5 ['data']
21 ['0123456789ABCDE']
37 ['1123456789ABCDE']
53 ['2123456789ABCDE']
69 ['3123456789ABCDE']
73 ['...']





More information about the Python-list mailing list