slow loop?

Peter Hansen peter at engcorp.com
Thu Jan 16 15:04:00 EST 2003


maney at pobox.com wrote:
> 
> Peter Hansen <peter at engcorp.com> wrote:
> > maney at pobox.com wrote:
> >> for l in someFile.xreadlines():
> >>     fields = csv.split(l)
> >>     ...
> >
> > I haven't reviewed the code, but with the above line driving it, it
> > doesn't seem likely it can handle any lines with embedded newlines,
> > which can easily occur in a CSV file (at least, those written by Excel
> > for one).
> 
> Interesting.  Because mmost of what I'm parsing comes from Excel
> spreadsheets.  Perhaps its because I use OpenOffice to dump them to
> CSV - at least IME it does a better job of not dumping unwanted empty
> columns.  Oh well, I know where I can get a more Excel-compatible
> parser if I need it, though having to get that installed everywhere
> will be a nuisance.

You just happen to be parsing data which has no embedded newlines in
any of the cells, which, I might add, is probably the case for 95% of
Excel spreadsheets.  (I don't think this is unique to Excel at all,
but maybe some spreadsheets can't even do this.)

Try hitting Alt-Enter while in the middle of a cell, and save the
spreadsheet as CSV.  You'll see that the regular lines end with 
CR/LF, while the embedded newline shows up as a simple LF character.
That different won't help you on Windows if you open the file
in text mode though, and not in any case if you use readlines().

Technically, the only thing that would help is to notice that the
LF occurred inside quotation marks, and continue reading on, but 
that's why the simple readlines() call is not enough.

> > Might be great for some people though.  Nice and simple.
> 
> That's me.  Why, I invented "do the simplest thing" *decades* before
> those gen-X folks came along!  <really big grin>

So where were you all those years when I needed that?! :-)

-Peter




More information about the Python-list mailing list