[Csv] escapechar confusion
John Machin
sjmachin at lexicon.net
Sun Feb 16 23:32:09 CET 2003
Docstring:
" csv.QUOTE_NONE means that quotes are never placed around
fields.\n"
" * escapechar - specifies a one-character string used to escape \n"
" the delimiter when quoting is set to QUOTE_NONE.\n"
===
libcsv.tex [note especially the alleged treatment of escapechar when
doublequote == False]:
\begin{memberdesc}[boolean]{doublequote}
Controls how instances of \var{quotechar} appearing inside a field should
be
themselves be quoted. When \constant{True}, the character is doubledd.
When \constant{False}, the \var{escapechar} must be a one-character string
which is used as a prefix to the \var{quotechar}. It defaults to
\constant{True}.
\end{memberdesc}
\begin{memberdesc}{escapechar}
A one-character string used to escape the \var{delimiter} if \var{quoting}
is set to \constant{QUOTE_NONE}. It defaults to \constant{None}.
\end{memberdesc}
===
My attempt at clarifying requirements on fiddling the contents of each
field being written:
[in examples, escapechar = '~' (to avoid backslashorrhea) and assumes
delimiter = ',' and quotechar = '"']
if quoting == QUOTE_NONE and escapechar is not None:
escape the delimiter, lineterminator(s), and the escapechar itself
Level 3, Macackie Mansions -> Level 3~, Macackie Mansions
Level 3, "Macackie Mansions" -> Level 3~, "Macackie Mansions"
Can~on Grando -> Can~~on Grando
# This scheme is plausible, unambiguous and in fact more efficient than
the "standard" doubling-of-quotes scheme.
elif quoting != QUOTE_NONE and not doublequote:
if escapechar is None:
raise "..."
escape the quotechar and the escapechar itself
Note: there is no *need* to escape the delimiter or line terminators, as
they are "covered"
by the quoting. Level 3, Macackie Mansions -> "Level 3, Macackie
Mansions"
Level 3, "Macackie Mansions" -> "Level 3, ~"Macackie Mansions~""
Can~on Grando -> "Can~~on Grando"
# This scheme is bizarre (like some other CSV mutants) but at least it
doesn't cause ambiguity on input.
# What software does this? Who sponsored its inclusion?
# Does it need option(s) to cater for (redundantly) escaping (a)
delimiter (b) line terminator(s)
# And it hasn't been implemented on output -- see below
else:
escapechar is not used
===
What _csv.c does on output:
>>> source = [123456, 'aaa,bbb', 'ccc,"ddd"', '"eee",fff', 9876.5]
>>> csv.writer(sys.stdout, escapechar="~", quoting=csv.QUOTE_NONE,
>>> doublequote=False).writerow(source)
123456,aaa~,bbb,ccc~,"ddd","eee"~,fff,9876.5
# as expected
>>> csv.writer(sys.stdout, escapechar="~", quoting=csv.QUOTE_MINIMAL,
>>> doublequote=False).writerow(source)
123456,"aaa,bbb","ccc,"ddd"",""eee",fff",9876.5
# No escaping done
===
What _csv.c does on input:
Firstly, the simple escape scheme:
>>> indata1 = ['123456,aaa~,bbb,ccc~,"ddd","eee"~,fff,9876.5']
>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE,
>>> doublequote=True)]
[['123456', 'aaa,bbb', 'ccc,"ddd"', 'eee~', 'fff', '9876.5']]
# wrong or confusing, QUOTE_NONE but still testing for quotechar at start
of field
>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE,
>>> doublequote=False)]
[['123456', 'aaa,bbb', 'ccc,"ddd"', 'eee,fff', '9876.5']]
# wrong or confusing, QUOTE_NONE but still testing for quotechar at start
of field
>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE,
>>> doublequote=False, quotechar=None)]
TypeError: bad argument type for built-in operation
# already grumbled about this
>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE,
>>> doublequote=False, quotechar="!")]
[['123456', 'aaa,bbb', 'ccc,"ddd"', '"eee",fff', '9876.5']]
# actual == expected
Secondly, the bizarre scheme (escaping the quotechar):
>>> indata2 = ['123456,aaa~,bbb,ccc~,"ddd","eee"~,fff,"ggg,~"hhh~"",iii-
>>> ~"jjj~",9876.5']
>>> [x for x in csv.reader(indata2, escapechar="~",
>>> quoting=csv.QUOTE_MINIMAL, doublequote=False, quotechar='"')]
[['123456', 'aaa,bbb', 'ccc,"ddd"', 'eee,fff', 'ggg,"hhh"', 'iii-"jjj"',
'9876.5']]
# bizarre + options; this is assuming that the writer was escaping
delimiters
--
More information about the Csv
mailing list