Andrew Dalke's space example (was Re: [Csv] csv)

Sat Feb 15 19:14:04 CET 2003

On Sat, 15 Feb 2003 01:24:45 -0700, Andrew Dalke <adalke at mindspring.com> 
wrote:

> Anyway, my file formats are either space delimited (no quotes --
> the following work "infile.readline().split(' ')) or tab delimited.  
> (Note,
> btw, that that is not split() and two adjacent spaces means there is
> an empty field.)
>
> I wanted to make a "space" dialect.  I thought the following would
> work, but it didn't.
>
>>>> class Space(csv.Dialect):
> ...     delimiter = " "
> ...     quotechar = False
> ...     escapechar = False

These should be one-byte strings, not booleans.

> ...     doublequote = False
> ...     skipinitialspace = False
> ...     lineterminator = "\n"
> ...     quoting = csv.QUOTE_NONE
> ...
>>>> Space()
> <__main__.Space instance at 0x162ff8>
>>>> csv.register_dialect("space", Space)
>>>> csv.reader(open("/home/mug/test.smi"))

You need to tell the reader factory which dialect to use, if you don't want 
the default ("excel").
csv.reader(open("/home/mug/test.smi"), dialect="space")

>
> Also, suppose for my own project I have a "SpaceDialect".
> The current API requires a global registry for that dialect.
> I don't like the chance of clobbering, though I know it to be
> rare.  Would the ability to pass
>
> dialect = SpaceDialect
>
> (that is, a Dialect subclass) rather than the name be
> an appropriate addition to the API?
>

Registration is not persistent. What is the use case for registering a 
dialect in one module and using it in a csv.reader() or writer() call in 
another module? If no use case, then registration is pointless, and the 
class could be passed as the dialect argument.

There are various problems brought out by Andrew's example; see attached 
file dalke.py

These are
(1) very obscure error message
   "TypeError: bad argument type for built-in operation"
caused by using quotechar = False instead of quotechar = None
Also this appears out of the reader() call, not the register_dialect() 
call!!!
*IF* there is a valid use case for registration, then the dialect should be 
validated then, not when used.
(2) says it needs quotechar != None even when quoting=QUOTE_NONE
(3) The "quoting" argument is honoured only by writers, not by readers -- 
i.e. in general you can't reliably read back a file that you've created and 
in particular to read Andrew D's files you need to set quotechar to some 
char that you hope is not in the input -- maybe '\0'.
(4) Maybe the whole dialect thing is a bit too baroque and Byzantine -- see 
example 5 in dalke.py. The **dict_of_arguments gadget offers the "don't 
need to type long list of arguments" advantage claimed for dialect classes, 
and you get the same obscure error message if you stuff up the type of an 
argument (see example 6) -- all of this without writing all that 
register/validate/etc code.

Maybe if we jump in quickly we could get an improved error message in the 
Python core for 2.3: at least identify which arg has the problem, and if 
lucky get it to say e.g. "expected <type x> given <type y>" and hey let's 
go for broke, how about which function is being called and even stop 
confusing the punters by calling functions in extension modules "built-in". 
This would benefit all Python users, not just csv users.

Cheers,
John
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dalke.py
Type: application/octet-stream
Size: 2742 bytes
Desc: not available
Url : http://mail.python.org/pipermail/csv/attachments/20030216/deb84579/attachment.obj