readlines() and "binary" files

Jeff Davis jdavis at empires.org
Tue Sep 24 21:37:09 EDT 2002


I think readlines() is just a shortcut for a very common task. Since your 
task isn't quite as common, I think it would be a better idea to use 
read() to read the whole thing, splitting the lines up by the 0x0d 0x0a 
pair (CR NL).

If it's a really large amount of data you can try to process it in chunks.

Regards,
        jeff davis

Justin wrote:

> Hi,
> 
> I have excel data with occasional multi-line fields,
> which when dumped to CSV translates to embedded CR's
> within a line, whereas the records/lines themselves
> are delimited by the CR+NL pair (this is MS-land).
> What I'd like to do is read those files and split every
> line apart on the semi-colon field separator. But it
> seems that whether the file is opened as text or not,
> (x)readlines() still considers the lone CR as a line
> delimiter and so not all my lines end up with the same
> number of fields as they should. Is there a way to handle
> this, or is readlines just not meant to work with anything
> but proper text files?
> 
>     f = open('testdata.csv','rb')
>     for line in f.xreadlines():
>         fields = line.split(';')
>         print len(fields) # should always be the same value




More information about the Python-list mailing list