[Csv] escapechar confusion

John Machin sjmachin at lexicon.net
Sun Feb 16 23:32:09 CET 2003


Docstring:

"        csv.QUOTE_NONE means that quotes are never placed around 
fields.\n"
"    * escapechar - specifies a one-character string used to escape \n"
"        the delimiter when quoting is set to QUOTE_NONE.\n"
===
libcsv.tex [note especially the alleged treatment of escapechar when 
doublequote == False]:

\begin{memberdesc}[boolean]{doublequote}
Controls how instances of \var{quotechar} appearing inside a field should 
be
themselves be quoted.  When \constant{True}, the character is doubledd.
When \constant{False}, the \var{escapechar} must be a one-character string
which is used as a prefix to the \var{quotechar}.  It defaults to
\constant{True}.
\end{memberdesc}

\begin{memberdesc}{escapechar}
A one-character string used to escape the \var{delimiter} if \var{quoting}
is set to \constant{QUOTE_NONE}.  It defaults to \constant{None}.
\end{memberdesc}
===
My attempt at clarifying requirements on fiddling the contents of each 
field being written:
[in examples, escapechar = '~' (to avoid backslashorrhea) and assumes 
delimiter = ',' and quotechar = '"']

if quoting == QUOTE_NONE and escapechar is not None:
   escape the delimiter, lineterminator(s), and the escapechar itself
   Level 3, Macackie Mansions -> Level 3~, Macackie Mansions
   Level 3, "Macackie Mansions" -> Level 3~, "Macackie Mansions"
   Can~on Grando -> Can~~on Grando
   # This scheme is plausible, unambiguous and in fact more efficient than 
the "standard" doubling-of-quotes scheme.
elif quoting != QUOTE_NONE and not doublequote:
   if escapechar is None:
      raise "..."
   escape the quotechar and the escapechar itself
   Note: there is no *need* to escape the delimiter or line terminators, as 
they are "covered"
   by the quoting. Level 3, Macackie Mansions -> "Level 3, Macackie 
Mansions"
   Level 3, "Macackie Mansions" -> "Level 3, ~"Macackie Mansions~""
   Can~on Grando -> "Can~~on Grando"
   # This scheme is bizarre (like some other CSV mutants) but at least it 
doesn't cause ambiguity on input.
   # What software does this? Who sponsored its inclusion?
   # Does it need option(s) to cater for (redundantly) escaping (a) 
delimiter (b) line terminator(s)
   # And it hasn't been implemented on output -- see below
else:
   escapechar is not used
===
What _csv.c does on output:

>>> source = [123456, 'aaa,bbb', 'ccc,"ddd"', '"eee",fff', 9876.5]
>>> csv.writer(sys.stdout, escapechar="~", quoting=csv.QUOTE_NONE, 
>>> doublequote=False).writerow(source)
123456,aaa~,bbb,ccc~,"ddd","eee"~,fff,9876.5
# as expected
>>> csv.writer(sys.stdout, escapechar="~", quoting=csv.QUOTE_MINIMAL, 
>>> doublequote=False).writerow(source)
123456,"aaa,bbb","ccc,"ddd"",""eee",fff",9876.5
# No escaping done
===
What _csv.c does on input:

Firstly, the simple escape scheme:

>>> indata1 = ['123456,aaa~,bbb,ccc~,"ddd","eee"~,fff,9876.5']

>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE, 
>>> doublequote=True)]
[['123456', 'aaa,bbb', 'ccc,"ddd"', 'eee~', 'fff', '9876.5']]
# wrong or confusing, QUOTE_NONE but still testing for quotechar at start 
of field

>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE, 
>>> doublequote=False)]
[['123456', 'aaa,bbb', 'ccc,"ddd"', 'eee,fff', '9876.5']]
# wrong or confusing, QUOTE_NONE but still testing for quotechar at start 
of field

>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE, 
>>> doublequote=False, quotechar=None)]
TypeError: bad argument type for built-in operation
# already grumbled about this

>>> [x for x in csv.reader(indata1, escapechar="~", quoting=csv.QUOTE_NONE, 
>>> doublequote=False, quotechar="!")]
[['123456', 'aaa,bbb', 'ccc,"ddd"', '"eee",fff', '9876.5']]
# actual == expected

Secondly, the bizarre scheme (escaping the quotechar):

>>> indata2 = ['123456,aaa~,bbb,ccc~,"ddd","eee"~,fff,"ggg,~"hhh~"",iii- 
>>> ~"jjj~",9876.5']

>>> [x for x in csv.reader(indata2, escapechar="~", 
>>> quoting=csv.QUOTE_MINIMAL, doublequote=False, quotechar='"')]
[['123456', 'aaa,bbb', 'ccc,"ddd"', 'eee,fff', 'ggg,"hhh"', 'iii-"jjj"', 
'9876.5']]
# bizarre + options; this is assuming that the writer was escaping 
delimiters
-- 
 


More information about the Csv mailing list