Best way to parse file into db-type layout?

Michael Hoffman cam.ac.uk at mh391.invalid
Sun May 1 05:05:23 EDT 2005


John Machin wrote:
 >[Michael Hoffman]:
>>What if the input is UTF-16? Your solution won't work for that. And there
>>are certainly UTF-16 CSV files out in the wild.
> 
> The csv module docs do say that Unicode is not supported.
> 
> This does appear to work, however, at least for data that could in
> fact be encoded as ASCII:

And for data that can't be expressed as ASCII? It doesn't work.

So throw out csv, just like fileinput. After all, despite its utility, 
and the fact that you obviously suspect the OP will never have to deal 
with UTF-16 (otherwise you would have suggested this without prompting), 
it won't work for *every* conceivable case.

> The usual trick to smuggle righteous data past the heathen (recode as
> UTF-8, cross the border, decode) should work.

True, but that's a lot of trouble to go to for something that you expect 
will never happen, and for a script that may only be run by the 
programmer who can certainly deal with the exceptions when they happen.

The range of sensible input is something to be determined by a 
specification, or the programmer if no spec exists. Not by kibitzers[1] 
speaking on high from c.l.p. <wink>
-- 
Michael Hoffman

[1] Yes, I include myself in that category.



More information about the Python-list mailing list