[Csv] Re: csv bugs

Magnus Lie Hetland magnus at hetland.org
Tue Mar 2 18:24:46 CET 2004


> (A better place for this discussion would probably be
> csv at mail.mojam.com.  I'm adding it to the cc list.)

Ah -- sorry. I wasn't aware of the list. I've subscribed now.

[snip]

> That may be, however development of the csv module's parser was
> driven by how Microsoft Excel behaves.

But wasn't also a driving force to allow "full" customization?

> The assumption was (rightly I think) that Excel reads or writes more
> CSV files than anything else. I don't believe it does anything with
> backslashes.

I'm sure you're right. The point is that the csv module supports
escape characters, and I believe the thing I pointed out is a missing
piece of functionality for those.

In other words: The Excel dialect uses quoting to deal with in-field
separators, quotes and newlines. The passwd dialect uses escapes to
deal with these. *However*, the csv module only supports dealing with
separators and escape characters using the escape character (quotes
are a non-issue, of course), not newlines. In other words, if you
choose to use an escape character rather than quotes, you can't have
newlines in your fields.

Almost, anyway. The fact is, as far as I can see, that you *can*
escape newlines, but in that special case, the escape character
*isn't* removed (as it is when you escape separators or escape
characters). This seems inconsistent, and has nothing to do with
backslashes in particular, just how escape characters should behave.

[snip]
> You're welcome to submit a patch.  I don't have time for it.

OK -- I guess I'm mainly looking for some feedback about whether this
seems like a reasonable behavior. (I'm quite thoroughly convinced that
it is, but I may very well be wrong :)

I haven't looked at the C implementation, so no promises about a patch
there... :/

> > And another thing: Perhaps a 'passwd' dialect could be added
> > alongside 'excel'? Something like:
[snip]
> I'll take a look at that.

Not sure about setting the quote character to '?' here, but since it
doesn't matter and you need to have one, it seemed like a natural
choice. (None wasn't allowed.)

> > For some reason you *have* to supply a quotechar, even if you
> > set QUOTE_NONE... I guess that's a bug too, in my book.
>
> Maybe.  Maybe just a feature.

Well, maybe ;)

But if you don't need an escape character when you're using quotes, I
don't think you should need quotes when you're using an escape
character.

Then again: I guess you do use an escape character (i.e. a double
quote) in the quoted mode as well, which may be what's complicating
the semantics and confusing me. Not sure how

  "foo "
  bar"

should be interpreted, for example. In this case removing the quote
may not make sense.

And... Adding another switch (or something) dictating the behavior of
the escape character doesn't seem good...

> Skip

-- 
Magnus Lie Hetland           "The mind is not a vessel to be filled,
http://hetland.org            but a fire to be lighted."  [Plutarch]


More information about the Csv mailing list