split CSV fields

John Machin sjmachin at lexicon.net
Thu Nov 16 06:23:50 EST 2006


John Machin wrote:
> John Machin wrote:
> > Fredrik Lundh wrote:
> > > robert wrote:
> > >
> > > > What is a most simple expression for splitting a CSV line
> > >  > with "-protected fields?
> > > >
> > > > s='"123","a,b,\"c\"",5.640'
> > >
> > > import csv
> > >
> > > the preferred way is to read the file using that module.  if you insist
> > > on processing a single line, you can do
> > >
> > >      cols = list(csv.reader([string]))
> > >
> > > </F>
> >
> > Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
> > (Intel)] on win32
> > | >>> import csv
> > | >>> s='"123","a,b,\"c\"",5.640'
> > | >>> cols = list(csv.reader([s]))
> > | >>> cols
> > [['123', 'a,b,c""', '5.640']]
> > # maybe we need a bit more:
> > | >>> cols = list(csv.reader([s]))[0]
> > | >>> cols
> > ['123', 'a,b,c""', '5.640']
> >
> > I'd guess that the OP is expecting 'a,b,"c"' for the second field.
> >
> > Twiddling with the knobs doesn't appear to help:
> >
> > | >>> list(csv.reader([s], escapechar='\\'))[0]
> > ['123', 'a,b,c""', '5.640']
> > | >>> list(csv.reader([s], escapechar='\\', doublequote=False))[0]
> > ['123', 'a,b,c""', '5.640']
> >
> > Looks like a bug to me; AFAICT from the docs, the last attempt should
> > have worked.
>
> Given Peter Otten's post, looks like
> (1) there's a bug in the "fmtparam" mechanism -- it's ignoring the
> escapechar in my first twiddle, which should give the same result as
> Peter's.
> (2)
> | >>> csv.excel.doublequote
> True
> According to my reading of the docs:
> """
> doublequote
> Controls how instances of quotechar appearing inside a field should be
> themselves be quoted. When True, the character is doubled. When False,
> the escapechar is used as a prefix to the quotechar. It defaults to
> True.
> """
> Peter's example should not have worked.

Doh. The OP's string was a raw string. I need some sleep.
Scrap bug #1!

| >>> s=r'"123","a,b,\"c\"",5.640'
| >>> list(csv.reader([s]))[0]
['123', 'a,b,\\c\\""', '5.640']
# What's that???
| >>> list(csv.reader([s], escapechar='\\'))[0]
['123', 'a,b,"c"', '5.640']
| >>> list(csv.reader([s], escapechar='\\', doublequote=False))[0]
['123', 'a,b,"c"', '5.640']

And there's still the problem with doublequote ....

Goodnight ...




More information about the Python-list mailing list