Best way to parse file into db-type layout?

Michael Hoffman cam.ac.uk at mh391.invalid
Sat Apr 30 09:31:08 EDT 2005


John Machin wrote:

>>>>That's nice. Well I agree with you, if the OP is concerned about embedded
>>>>CRs, LFs and ^Zs in his data (and he is using Windows in the latter case),
>>>>then he *definitely* shouldn't use fileinput.
>>>
>>>And if the OP is naive enough not to be concerned, then it's OK, is
>>>it?
>>
>>It simply isn't a problem in some real-world problem domains. And if there
>>are control characters the OP didn't expect in the input, and csv loads it
>>without complaint, I would say that he is likely to have other problems once
>>he's processing it.
> 
> Presuming for the moment that the reason for csv not complaining is
> that the data meets the csv non-spec and that the csv module is
> checking that: then at least he's got his data in the structural
> format he's expecting; if he doesn't do any/enough validation on the
> data, we can't save him from that.

What if the input is UTF-16? Your solution won't work for that. And there
are certainly UTF-16 CSV files out in the wild.

I think at some point you have to decide that certain kinds of data
are not sensible input to your program, and that the extra hassle in
programming around them is not worth the benefit.

> There is also an "on principle" element to it as well -- with
> fileinput one has to use the awkish methods like filelineno() and
> nextfile(); strikes me as a tricksy and inverted way of doing things.

Yes, indeed. I never use those, and would probably do something akin to what
you are suggesting rather than doing so. I simply enjoy the no-hassle
simplicity of fileinput.input() rather than worrying about whether my data
will be piped in, or in file(s) specified on the command line.
-- 
Michael Hoffman



More information about the Python-list mailing list