From skip at pobox.com Tue Apr 8 17:12:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue, 8 Apr 2003 10:12:44 -0500 Subject: [Csv] PEP305 csv package: from csv import csv? (fwd) Message-ID: <16018.59116.101278.751277@montanaro.dyndns.org> Passing this along... I have no argument against what Hamish asks. Any thoughts from this crowd? Skip -------------- next part -------------- An embedded message was scrubbed... From: Hamish Lawson Subject: PEP305 csv package: from csv import csv? Date: Tue, 08 Apr 2003 15:44:02 +0100 Size: 4469 Url: http://mail.python.org/pipermail/csv/attachments/20030408/8cce5300/attachment.mht From LogiplexSoftware at earthlink.net Tue Apr 8 16:13:51 2003 From: LogiplexSoftware at earthlink.net (Cliff Wells) Date: 08 Apr 2003 07:13:51 -0700 Subject: [Csv] PEP305 csv package: from csv import csv? (fwd) In-Reply-To: <16018.59116.101278.751277@montanaro.dyndns.org> References: <16018.59116.101278.751277@montanaro.dyndns.org> Message-ID: <1049811231.3721.3.camel@software1.logiplex.internal> On Tue, 2003-04-08 at 08:12, Skip Montanaro wrote: > Passing this along... I have no argument against what Hamish asks. Any > thoughts from this crowd? None here. I just didn't know how to do it =) BTW, I know you're new here, but please don't top-post > > ______________________________________________________________________ > > From: Hamish Lawson > To: python-list at zope.org > Subject: PEP305 csv package: from csv import csv? > Date: 08 Apr 2003 15:44:02 +0100 > > According to the documentation in progress at > > http://www.python.org/dev/doc/devel/whatsnew/node14.html > > use of the forthcoming csv module (as described in PEP305) requires it to > be imported from the csv package: > > from csv import csv > > input = open('datafile', 'rb') > reader = csv.reader(input) > for line in reader: > print line > > Is there some reason why the cvs package's __init__.py doesn't import the > required names from cvs.py, so allowing the shorter form below? > > import csv > > input = open('datafile', 'rb') > reader = csv.reader(input) > for line in reader: > print line > > > Hamish Lawson -- Cliff Wells, Software Engineer Logiplex Corporation (www.logiplex.net) (503) 978-6726 x308 (800) 735-0555 x308 From skip at pobox.com Tue Apr 8 18:08:26 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue, 8 Apr 2003 11:08:26 -0500 Subject: [Csv] PEP305 csv package: from csv import csv? (fwd) In-Reply-To: <1049811231.3721.3.camel@software1.logiplex.internal> References: <16018.59116.101278.751277@montanaro.dyndns.org> <1049811231.3721.3.camel@software1.logiplex.internal> Message-ID: <16018.62458.366683.959122@montanaro.dyndns.org> Cliff> BTW, I know you're new here, but please don't top-post I didn't top post. I attached an entire email message. Attachments are generally added at the end. Skip From skip at pobox.com Wed Apr 9 15:43:11 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed, 9 Apr 2003 08:43:11 -0500 Subject: [Csv] Re: [Python-Dev] PEP305 csv package: from csv import csv? In-Reply-To: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk> References: <5.2.0.9.0.20030409143148.01d0d620@spey.st-andrews.ac.uk> Message-ID: <16020.9071.801846.936864@montanaro.dyndns.org> >>>>> "Hamish" == Hamish Lawson writes: Hamish> [Please excuse my posting this message here after initially Hamish> posting it to python-list, but I realised afterwards that this Hamish> might be the more appropriate forum (it hasn't so far had any Hamish> responses on python-list anyway).] ... Actually, I forwarded your note to the csv mailing list: csv at mail.mojam.com. That'd be the best place to discuss the topic. ;-) I'll probably get around to changing things in the next day or two, but please feel free to submit a patch so I don't forget. Skip From noah at noah.org Thu Apr 10 08:05:22 2003 From: noah at noah.org (Noah Spurrier) Date: Wed, 09 Apr 2003 23:05:22 -0700 Subject: [Csv] PEP 305 Message-ID: <3E9509A2.90308@noah.org> This is great. This has my vote. Probably half of my projects have a CSV parser somewhere. For better or worse, I use CSV files far more than I use XML. A built-in CSV parser just makes sense. Yours, Noah From skip at pobox.com Thu Apr 10 15:33:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu, 10 Apr 2003 08:33:44 -0500 Subject: [Csv] PEP 305 In-Reply-To: <3E9509A2.90308@noah.org> References: <3E9509A2.90308@noah.org> Message-ID: <16021.29368.91341.721810@montanaro.dyndns.org> Noah> This is great. This has my vote. Probably half of my projects have Noah> a CSV parser somewhere. For better or worse, I use CSV files far Noah> more than I use XML. A built-in CSV parser just makes sense. Thanks for the vote of confidence. The csv code is now in the Python CVS repository. I need to check in one itty bitty change (to hoist the contents of the csv.csv module to the top level) and then I think the API is set. Barring an highly unlikely change of heart by the BDFL the csv package will be in 2.3. Skip From jeremy at zope.com Thu Apr 10 19:12:47 2003 From: jeremy at zope.com (Jeremy Hylton) Date: 10 Apr 2003 13:12:47 -0400 Subject: [Csv] csv needs to be gc-aware? Message-ID: <1049994014.4473.91.camel@slothrop.zope.com> I've been reviewing extension modules looking for C types that should participate in garbage collection. I think the csv ReaderObj and WriterObj should participate. The ReaderObj it contains a reference to input_iter that could be an arbitrary Python object. The iterator object could well participate in a cycle that refers to the ReaderObj. The WriterObj has a reference to a writeline callable, which could well be a method of an object that also points to the WriterObj. The Dialect object appears to be safe, because the only PyObject * it refers should be a string. Safe until someone creates an insane string subclass <0.4 wink>. Also, an unrelated comment about the code, the lineterminator of the Dialect is managed by a collection of little helper functions like get_string, set_string, etc. This code appears to be excessively general; since they're called only once, it seems clearer to inline the logic directly in the get/set methods for the lineterminator. Jeremy From djc at object-craft.com.au Fri Apr 11 02:30:08 2003 From: djc at object-craft.com.au (Dave Cole) Date: 11 Apr 2003 10:30:08 +1000 Subject: [Csv] PEP 305 In-Reply-To: <16021.29368.91341.721810@montanaro.dyndns.org> References: <3E9509A2.90308@noah.org> <16021.29368.91341.721810@montanaro.dyndns.org> Message-ID: >>>>> "Skip" == Skip Montanaro writes: Noah> This is great. This has my vote. Probably half of my projects Noah> have a CSV parser somewhere. For better or worse, I use CSV Noah> files far more than I use XML. A built-in CSV parser just makes Noah> sense. Skip> Thanks for the vote of confidence. The csv code is now in the Skip> Python CVS repository. I need to check in one itty bitty change Skip> (to hoist the contents of the csv.csv module to the top level) Skip> and then I think the API is set. Barring an highly unlikely Skip> change of heart by the BDFL the csv package will be in 2.3. I would just like to thank Skip for going the distance and making this happen. It takes a special type of doggedness to take something that is 90% complete and perform the remaining 90% of work. A job well done. - Dave -- http://www.object-craft.com.au From andrewm at object-craft.com.au Fri Apr 11 03:27:12 2003 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Fri, 11 Apr 2003 11:27:12 +1000 Subject: [Csv] PEP 305 In-Reply-To: Message from Dave Cole References: <3E9509A2.90308@noah.org> <16021.29368.91341.721810@montanaro.dyndns.org> Message-ID: <20030411012712.CAB113C458@coffee.object-craft.com.au> >Skip> Thanks for the vote of confidence. The csv code is now in the >Skip> Python CVS repository. I need to check in one itty bitty change >Skip> (to hoist the contents of the csv.csv module to the top level) >Skip> and then I think the API is set. Barring an highly unlikely >Skip> change of heart by the BDFL the csv package will be in 2.3. > >I would just like to thank Skip for going the distance and making this >happen. It takes a special type of doggedness to take something that >is 90% complete and perform the remaining 90% of work. > >A job well done. I'd second that! Now, if only I could find the time to address the gc issues Jeremy highlighted, and check through the points raised by Neal... 8-( -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From skip at pobox.com Fri Apr 11 03:59:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu, 10 Apr 2003 20:59:43 -0500 Subject: [Csv] PEP 305 In-Reply-To: <20030411012712.CAB113C458@coffee.object-craft.com.au> References: <3E9509A2.90308@noah.org> <16021.29368.91341.721810@montanaro.dyndns.org> <20030411012712.CAB113C458@coffee.object-craft.com.au> Message-ID: <16022.8591.742545.118246@montanaro.dyndns.org> Andrew> Now, if only I could find the time to address the gc issues Andrew> Jeremy highlighted, and check through the points raised by Andrew> Neal... 8-( Let's split them up if we can. I'm about to knock of for the (Thursday) evening, but if nothing's been posted by tomorrow, I'll take a crack at dividing things up in a reasonable fashion. (The docs need another pass as well.) Skip From LogiplexSoftware at earthlink.net Fri Apr 11 17:16:49 2003 From: LogiplexSoftware at earthlink.net (Cliff Wells) Date: 11 Apr 2003 08:16:49 -0700 Subject: [Csv] PEP 305 In-Reply-To: References: <3E9509A2.90308@noah.org> <16021.29368.91341.721810@montanaro.dyndns.org> Message-ID: <1050074208.13005.129.camel@software1.logiplex.internal> On Thu, 2003-04-10 at 17:30, Dave Cole wrote: > >>>>> "Skip" == Skip Montanaro writes: > > Noah> This is great. This has my vote. Probably half of my projects > Noah> have a CSV parser somewhere. For better or worse, I use CSV > Noah> files far more than I use XML. A built-in CSV parser just makes > Noah> sense. > > Skip> Thanks for the vote of confidence. The csv code is now in the > Skip> Python CVS repository. I need to check in one itty bitty change > Skip> (to hoist the contents of the csv.csv module to the top level) > Skip> and then I think the API is set. Barring an highly unlikely > Skip> change of heart by the BDFL the csv package will be in 2.3. > > I would just like to thank Skip for going the distance and making this > happen. It takes a special type of doggedness to take something that > is 90% complete and perform the remaining 90% of work. > > A job well done. Second that, and also thanks to Kevin Altis for getting the ball rolling. -- Cliff Wells, Software Engineer Logiplex Corporation (www.logiplex.net) (503) 978-6726 x308 (800) 735-0555 x308 From skip at pobox.com Sat Apr 12 01:15:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri, 11 Apr 2003 18:15:43 -0500 Subject: [Csv] csv to-do Message-ID: <16023.19615.516868.740823@montanaro.dyndns.org> To do list: * catch docs up to the current code (Skip) * make csv module gc-aware (Jeremy - said he'd do this) * Neal's code review feedback: - remove TODO comment at top of file--it's empty (done I think) - is CSV going to be maintained outside the python tree? If not, remove the 2.2 compatibility macros for: PyDoc_STR, PyDoc_STRVAR, PyMODINIT_FUNC, etc. (rationale for leaving them in explained already) - inline the following functions since they are used only in one place get_string, set_string, get_nullchar_as_None, set_nullchar_as_None, join_reset (maybe) (Dave or Andrew) - rather than use PyErr_BadArgument, should you use assert? (first example, Dialect_set_quoting, line 218) (Dave or Andrew) - is it necessary to have Dialect_methods, can you use 0 for tp_methods? - remove commented out code (PyMem_DEL) on line 261 (done) - Have you used valgrind on the test to find memory overwrites/leaks? (Dave or Andrew) - PyString_AsString()[0] on line 331 could return NULL in which case you are dereferencing a NULL pointer (Skip) - note sure why there are casts on 0 pointers lines 383-393, 733-743, 1144-1154, 1164-1165 (I think this refers to the various static PyTypeObjects. I believe the convention normally is to apply the casts.) - Reader_getiter() can be removed and use PyObject_SelfIter() (Dave or Andrew) - I think you need PyErr_NoMemory() before returning on line 768, 1178 (Dave or Andrew) - is PyString_AsString(self->dialect->lineterminator) on line 994 guaranteed not to return NULL? If not, it could crash by passing to memmove. (Dave or Andrew) - PyString_AsString() can return NULL on line 1048 and 1063, the result is passed to join_append() (Dave or Andrew) - iteratable should be iterable? (line 1088) (done) - why doesn't csv_writerows() have a docstring? csv_writerow does (done - still to checkin) - any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE (Skip) - csv_unregister_dialect, csv_get_dialect could use METH_O so you don't need to use PyArg_ParseTuple (Skip) - in init_csv, recommend using PyModule_AddIntConstant and PyModule_AddStringConstant where appropriate (Skip) From skip at pobox.com Sun Apr 13 02:42:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat, 12 Apr 2003 19:42:32 -0500 Subject: [Csv] csv to-do (fwd) Message-ID: <16024.45688.254920.908136@montanaro.dyndns.org> I believe I took care of these items from the to-do list: - PyString_AsString()[0] on line 331 could return NULL in which case you are dereferencing a NULL pointer - PyString_AsString() can return NULL on line 1048 and 1063, the result is passed to join_append() - iteratable should be iterable? (line 1088) - why doesn't csv_writerows() have a docstring? csv_writerow does - any PyUnicode_* methods should be protected with #ifdef Py_USING_UNICODE - csv_unregister_dialect, csv_get_dialect could use METH_O so you don't need to use PyArg_ParseTuple - in init_csv, recommend using PyModule_AddIntConstant and PyModule_AddStringConstant where appropriate Skip From skip at pobox.com Sun Apr 13 03:46:35 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat, 12 Apr 2003 20:46:35 -0500 Subject: [Csv] csv.utils.Sniffer notes Message-ID: <16024.49531.827290.369310@montanaro.dyndns.org> I guess this is mostly for Cliff, but everyone should feel free to chime in. I went to write a subsection describing the utils.Sniffer class and began to wonder about a few things. * It's not clear to me that passing a file object to Sniffer.sniff() is the correct way to give it data to operate on. First, because you can perform multiple operations (sniff, hasHeaders), it requires the file object to be rewindable. Second, it doesn't seem to me that setting self.fileobj in sniff() is the right thing. What if all the user is interested in is whether the CSV file has headers? I think it makes more sense to simply pass in a chunk of data to the constructor to use as the sample. The caller can then worry about rewindability in his own code. * The mixture of camelCase and underscore separators in the method names. I believe it's more usual (especially in the Python core) to use an underscore to separate words in attribute names. * The use of eval(). I think the only things we can reasonably have in CSV files are strings, ints and floats, so code to determine types can look like: try: thisType = type(int(row[col])) except ValueError: try: thisType = type(float(row[col])) except ValueError: thisType = str OverFlowError doesn't need to be considered in 2.3 because int() silently coerces to longs: >>> int(6e23) 600000000000000016777216L 2.2 and earlier probably still require the OverflowError check. * I don't think the sniffer needs to offer a register_dialect() method. The sniff() method returns a dialect. The programmer can then call the normal dialect registration function if need be. Attached is an untested version of sniffer.py which implements the various changes except for the eval() stuff. The logic there was complex enough that I didn't want to risk screwing it up. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: sniffer.diff Type: application/octet-stream Size: 8725 bytes Desc: sniffer diff Url : http://mail.python.org/pipermail/csv/attachments/20030412/63b3b2da/attachment.obj From skip at pobox.com Wed Apr 16 19:16:58 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed, 16 Apr 2003 12:16:58 -0500 Subject: [Csv] [Python-Dev] 2.3b1 release (fwd) Message-ID: <16029.36874.280010.311491@montanaro.dyndns.org> Folks, Guido wants to make a 2.3b1 release in the next week or so (see attached message). Any chance of taking care of some/most/all the remaining to-do list items in that timeframe? Skip -------------- next part -------------- An embedded message was scrubbed... From: Guido van Rossum Subject: [Python-Dev] 2.3b1 release Date: Wed, 16 Apr 2003 11:52:10 -0400 Size: 4015 Url: http://mail.python.org/pipermail/csv/attachments/20030416/d751ad9d/attachment.mht From skip at pobox.com Thu Apr 24 23:13:29 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 16:13:29 -0500 Subject: [Csv] csv.utils.Sniffer notes Message-ID: <16040.21369.878701.510760@montanaro.dyndns.org> Sorry for the late notice on this. The 2.3b1 release snuck up on me. I sent this back on the 12th. It's in my outgoing mail archive, but I didn't see it in the mailing list archives and never received any responses. Maybe my mailman installation is broken. The last message archived appears on the 11th. Note also that I just checked in a change recommended by the PythonLabs folks - it's once again a csv module (no longer a package). Cliff's sniffer class is now csv.Sniffer. 2.3b1 is scheduled to be frozen tomorrow at noon. After that, the API can't change. If I don't hear from anyone about this real soon I'll go ahead and implement the change. Skip ---------------------------------------------------------------------- I guess this is mostly for Cliff, but everyone should feel free to chime in. I went to write a subsection describing the Sniffer class and began to wonder about a few things. * It's not clear to me that passing a file object to Sniffer.sniff() is the correct way to give it data to operate on. First, because you can perform multiple operations (sniff, hasHeaders), it requires the file object to be rewindable. Second, it doesn't seem to me that setting self.fileobj in sniff() is the right thing. What if all the user is interested in is whether the CSV file has headers? I think it makes more sense to simply pass in a chunk of data to the constructor to use as the sample. The caller can then worry about rewindability in his own code. * The mixture of camelCase and underscore separators in the method names. I believe it's more usual (especially in the Python core) to use an underscore to separate words in attribute names. * The use of eval(). I think the only things we can reasonably have in CSV files are strings, ints and floats, so code to determine types can look like: try: thisType = type(int(row[col])) except ValueError: try: thisType = type(float(row[col])) except ValueError: thisType = str OverFlowError doesn't need to be considered in 2.3 because int() silently coerces to longs: >>> int(6e23) 600000000000000016777216L 2.2 and earlier probably still require the OverflowError check. * I don't think the sniffer needs to offer a register_dialect() method. The sniff() method returns a dialect. The programmer can then call the normal dialect registration function if need be. Attached is a context diff against the current CSV version of Lib/csv.py and Lib/test/test_csv.py which implements the various changes except for the eval() stuff and adds a couple simple sniffer tests. The logic for the eval() stuff was complex enough that I didn't want to risk screwing it up at this point. Skip From skip at mojam.com Thu Apr 24 23:17:01 2003 From: skip at mojam.com (Skip Montanaro) Date: Thu, 24 Apr 2003 16:17:01 -0500 (CDT) Subject: [Csv] test Message-ID: <200304242117.h3OLH1ZO015947@montanaro.dyndns.org> test From skip at pobox.com Thu Apr 24 23:26:39 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 16:26:39 -0500 Subject: [Csv] my apologies Message-ID: <16040.22159.742406.540582@montanaro.dyndns.org> My apologies folks. I had a bit of a screwup in the mailman 2.1 install on my server which didn't show up until the server was rebooted. Got things straightened out now I think. Skip From skip at pobox.com Thu Apr 24 23:28:44 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu, 24 Apr 2003 16:28:44 -0500 Subject: [Csv] csv.utils.Sniffer notes In-Reply-To: <16040.21369.878701.510760@montanaro.dyndns.org> References: <16040.21369.878701.510760@montanaro.dyndns.org> Message-ID: <16040.22284.966171.929840@montanaro.dyndns.org> Skip> Attached is a context diff against the current CSV version of Skip> Lib/csv.py and Lib/test/test_csv.py ... Duh... Here's the diff. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: csv.diff Type: application/octet-stream Size: 9433 bytes Desc: not available Url : http://mail.python.org/pipermail/csv/attachments/20030424/eb682b20/attachment.obj From LogiplexSoftware at earthlink.net Sat Apr 26 00:03:18 2003 From: LogiplexSoftware at earthlink.net (Cliff Wells) Date: 25 Apr 2003 15:03:18 -0700 Subject: [Csv] csv.utils.Sniffer notes In-Reply-To: <16040.21369.878701.510760@montanaro.dyndns.org> References: <16040.21369.878701.510760@montanaro.dyndns.org> Message-ID: <1051308198.2880.11.camel@dhcppc2> On Thu, 2003-04-24 at 14:13, Skip Montanaro wrote: > Sorry for the late notice on this. The 2.3b1 release snuck up on me. > > I sent this back on the 12th. It's in my outgoing mail archive, but I > didn't see it in the mailing list archives and never received any > responses. Maybe my mailman installation is broken. The last message > archived appears on the 11th. > > Note also that I just checked in a change recommended by the PythonLabs > folks - it's once again a csv module (no longer a package). Cliff's sniffer > class is now csv.Sniffer. 2.3b1 is scheduled to be frozen tomorrow at noon. > After that, the API can't change. If I don't hear from anyone about this > real soon I'll go ahead and implement the change. > > Skip > > ---------------------------------------------------------------------- I > guess this is mostly for Cliff, but everyone should feel free to chime in. > I went to write a subsection describing the Sniffer class and began to > wonder about a few things. Sorry I've been out of action. We moved our office and I've been offline for a few days. Oddly, I had the LAN installed at the new location two days ago, everything plugged in and ready to go, but didn't get AC power until about an hour ago =) > * It's not clear to me that passing a file object to Sniffer.sniff() is > the correct way to give it data to operate on. First, because you can > perform multiple operations (sniff, hasHeaders), it requires the file > object to be rewindable. Second, it doesn't seem to me that setting > self.fileobj in sniff() is the right thing. What if all the user is > interested in is whether the CSV file has headers? I think it makes > more sense to simply pass in a chunk of data to the constructor to use > as the sample. The caller can then worry about rewindability in his own > code. I've been thinking the same thing myself. Rewindability is an issue. Originally DSV just used a chunk of data, so switching back to that shouldn't be a problem. > * The mixture of camelCase and underscore separators in the method names. > I believe it's more usual (especially in the Python core) to use an > underscore to separate words in attribute names. > > * The use of eval(). I think the only things we can reasonably have in > CSV files are strings, ints and floats, so code to determine types can > look like: > > try: > thisType = type(int(row[col])) > except ValueError: > try: > thisType = type(float(row[col])) > except ValueError: > thisType = str Seems reasonable. > OverFlowError doesn't need to be considered in 2.3 because int() > silently coerces to longs: > > >>> int(6e23) > 600000000000000016777216L > > 2.2 and earlier probably still require the OverflowError check. > > * I don't think the sniffer needs to offer a register_dialect() method. > The sniff() method returns a dialect. The programmer can then call the > normal dialect registration function if need be. Okay. > Attached is a context diff against the current CSV version of Lib/csv.py and > Lib/test/test_csv.py which implements the various changes except for the > eval() stuff and adds a couple simple sniffer tests. The logic for the > eval() stuff was complex enough that I didn't want to risk screwing it up at > this point. You're saying my code isn't beautiful and easy to follow? -- Cliff Wells, Software Engineer Logiplex Corporation (www.logiplex.net) (503) 978-6726 x308 (800) 735-0555 x308