Trying to fix Invalid CSV File

Roel Schroeven rschroev_nospam_ml at fastmail.fm
Wed Aug 6 05:21:22 EDT 2008


Ryan Rosario schreef:

> Next time I am going to be much more careful. Tab delimited is
> probably better for my purpose, but I can definitely see there being
> issues with invisible tab characters and other weirdness.

No matter which delimiter you use, there will always be data that 
includes that delimiter, and you need some way to deal with it.

I prefer the approach that esr suggests in "The Art of Unix Programming" 
(http://www.catb.org/~esr/writings/taoup/html/ch05s02.html): define a 
delimiter (preferably but necessary one that doesn't occur frequently in 
your data) and an escape character. On output, escape all occurrences of 
delimiter and escape character in your data. On input, you can trivially 
and unambiguously distinguish delimiters in the data from delimiters 
between data, and unescape everything.

Cheers,
Roel

-- 
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
   -- Isaac Asimov

Roel Schroeven



More information about the Python-list mailing list