* separated values

Cliff Wells logiplexsoftware at earthlink.net
Tue Jan 15 17:40:25 EST 2002


On Tue, 15 Jan 2002 22:22:56 +0000 (UTC)
Magnus Lie Hetland wrote:

> In article <...>, Cliff Wells wrote:
> [snip]
> >The line breaks inside quoted fields is an absolute requirement.  I
would
> >guess that a huge number of CSV files are generated by MS Excel or
Access
> >and they will put newlines inside quotes.  I doubt that allowing spaces
> >between the quote and the separator is a good idea because then it
becomes
> >somewhat ambiguous whether the space should be included as part of the
data
> >or if it should be ignored.  At the very least, it causes a
differentiation
> >between how to handle quoted data versus unquoted data (spaces allowed
> >around quoted data, not allowed around unquoted data).
> 
> Absolutely. What I reacted to was the following statement (from the
> web page):
> 
>   The parser will raise a csv.Error exception under any of the
>   following circumstances: 
> 
>     * If the closing " on a quoted field is not immediately followed
>       by either end of line, or a field separator. 
> 
>     * If an end of line is encountered which is not at the end
>       of string 
> 
> It is obvious that we agree that the second is unreasonable, and it
> seems you may agree that the first is unreasonable too? (I'm only
> talking about allowing space outside quoted fields here.)
> 
> OTOH: Should one simply disallow all space surrounding an unquoted
> value? I guess any reasonable package generating a value surrounded by
> space (which should be preserved) would quote that field (including
> the space)...

I wouldn't put this restriction on it, the line

this, is a, test    <--- 4 extra spaces after "test"

should result in 

['this', ' is a', ' test    ']

I don't think quoting spaces is considered a requirement and IIRC, Excel
doesn't do it (it only quotes newlines and embedded quotes).  Whether Excel
can be considered "reasonable" is arguable, but supporting its conventions
is important.

Since we have to allow spaces around unquoted data, we shouldn't allow
spaces outside the quotes on quoted data, since for the unquoted data the
spaces would have meaning (they are part of the data), whereas around
quoted data they are mere noise.

> Oh, well. Whatever the standard, it should be clearly spelled out in
> the docs, and preferrably allow a lot of switches/flags to be supplied
> so one can get the behaviour one needs...

As many switches as ls?

-- 
Cliff Wells
Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308
(800) 735-0555 x308




More information about the Python-list mailing list