Where is the syntax for the dict() constructor ?!

John Machin sjmachin at lexicon.net
Sat Jul 7 20:13:48 EDT 2007


On Jul 7, 4:58 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Sat, 07 Jul 2007 08:32:52 +0200, Hendrik van Rooyen wrote:
> >> erik,viking,"ham, spam and eggs","He said ""Ni!""","line one
> >> line two"
>
> >> That's 5 elements:
>
> >> 1: eric
> >> 2: viking
> >> 3: ham, spam and eggs
> >> 4: He said "Ni!"
> >> 5: line one
> >>    line two
>
> > Also true - What can I say - I can only wriggle and mutter...
>
> > I see that you escaped the quotes by doubling them up -
>
> That's how Excel and the `csv` module do it.
>
> > What would the following parse to?:
>
> >  erik,viking,ham, spam and eggs,He said "Ni!",line one
> >  line two
>
> Why don't you try yourself?  The `csv` module returns two records, the
> first has six items:
>
> 1: erik
> 2: viking
> 3: ham
> 4:  spam and eggs
> 5: He said "Ni!"
> 6: line one
>
> 'line two' is the only item in the next record then.
>

The rules for quoting when writing can be expressed as:
def outrow(inrow, quotechar='"', delimiter=','):
  out = []
  for field in inrow:
    if quotechar in field:
      field = quotechar + field.replace(quotechar, quotechar*2) +
quotechar
    elif delimiter in field or '\n' in field:
      # See note below.
      field = quotechar + field + quotechar
    out.append(field)
  return delimiter.join(out)

Note: characters other than delimiter and \n can be included in the
"to be quoted" list.

What readers do with data that can *not* have been produced by a
writer following the rules can get worse than BlackJack's example.

Consider this: file nihao1.csv contains the following single line:
'Is the "," a mistake in "Ni, hao!"?\r\n'

Openoffice.org's Calc 2.1 shows the equivalent of
['Is the "', ' a mistake in Ni', ' hao!"?\n'] in a Text Import window,
but then silently produces nothing. A file with two such lines causes
5 fields to be shown in the window -- it apparently thinks the
newlines are inside quoted fields!

Gnumeric 1.7.6 silently produces the equivalent of
result = ['Is the "', ' a mistake in ', 'hao!"?']
map(len, result) -> [8, 14, 6]
What happened to Ni?
Multiple such lines produce multiple rows.

Excel 11.0 (2003) silently produces in effect
result = ['Is the "', ' a mistake in Ni', ' hao!"?']
map(len, result) -> [8, 16, 7]
Multiple such lines produce multiple rows.

The csv module does what Excel does.

Consumers of csv files are exhorted to apply whatever sanity checks
they can. Examples:
(1) If the csv file was produced as a result of a database query, the
number of columns should be known and used as a check on the length of
each row received.
(2) A field containing an odd number of " characters (or more
generally, not meeting whatever quoting convention might be expected
in the underlying data) should be treated with suspicion.

Cheers,
John





More information about the Python-list mailing list