From editor at the-tech-news.com Sat May 10 16:14:17 2003 From: editor at the-tech-news.com (The TechNews) Date: Sat, 10 May 2003 16:14:17 +0200 Subject: [Csv] Worldwide Partners program Message-ID: The TechNews, May 2003 Production Mini-plants in mobile containers. Worldwide Partners program Science Network will supply to countries and developing regions the technology and necessary support for production in series of Mini-plants in mobile containers (40-foot). The Mini-plant system is designed in such a way that all the production machinery is fixed on the platform of the container, with all wiring, piping, and installation parts; that is, they are fully equipped... and the mini-plant is ready for production." More than 700 portable production systems: Bakeries, Steel Nails, Welding Electrodes, Tire Retreading, Reinforcement Bar Bending for Construction Framework, Sheeting for Roofing, Ceilings and Fa?ades, Plated Drums, Aluminum Buckets, Injected Polypropylene Housewares, Pressed Melamine Items (Glasses, Cups, Plates, Mugs, etc.), Mufflers, Construction Electrically Welded Mesh, Plastic Bags and Packaging, Mobile units of medical assistance, Sanitary Material, Hypodermic Syringes, Hemostatic Clamps, etc. Science Network has started a Co-investment program for the installation of small Assembly plants to manufacture in series the Mini-plants of portable production on site, region or country where required. One of the most relevant features is the fact that these plants will be connected to the World Trade System (WTS) with access to more than 50 million raw materials, products and services and automatic transactions for world trade. Due to financial reasons, involving cost and social impact, the best solution is setting up assembly plants on the same countries and regions, using local resources (labor, some equipment, etc.) Science Network participates at 50% (fifty percent) for investment of each Assembly plant. If you are interested in being a Science Network partner in your country or region, you can send your CV to: Mini-plants Worldwide Partners program By Robert B. Lethe, The TechNews, Editor ------------------------------------------------------------------------- If you received this in error or would like to be removed from our list, please return us indicating: remove or un-subscribe in subject field, Thanks. Editor ? 2003 The TechNews. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/csv/attachments/20030510/546c6e23/attachment.htm From editor at the-tech-news.com Mon May 12 11:32:47 2003 From: editor at the-tech-news.com (The TechNews) Date: Mon, 12 May 2003 11:32:47 +0200 Subject: [Csv] Worldwide Partners program Message-ID: The TechNews, May 2003 Production Mini-plants in mobile containers. Worldwide Partners program Science Network will supply to countries and developing regions the technology and necessary support for production in series of Mini-plants in mobile containers (40-foot). The Mini-plant system is designed in such a way that all the production machinery is fixed on the platform of the container, with all wiring, piping, and installation parts; that is, they are fully equipped... and the mini-plant is ready for production." More than 700 portable production systems: Bakeries, Steel Nails, Welding Electrodes, Tire Retreading, Reinforcement Bar Bending for Construction Framework, Sheeting for Roofing, Ceilings and Fa?ades, Plated Drums, Aluminum Buckets, Injected Polypropylene Housewares, Pressed Melamine Items (Glasses, Cups, Plates, Mugs, etc.), Mufflers, Construction Electrically Welded Mesh, Plastic Bags and Packaging, Mobile units of medical assistance, Sanitary Material, Hypodermic Syringes, Hemostatic Clamps, etc. Science Network has started a Co-investment program for the installation of small Assembly plants to manufacture in series the Mini-plants of portable production on site, region or country where required. One of the most relevant features is the fact that these plants will be connected to the World Trade System (WTS) with access to more than 50 million raw materials, products and services and automatic transactions for world trade. Due to financial reasons, involving cost and social impact, the best solution is setting up assembly plants on the same countries and regions, using local resources (labor, some equipment, etc.) Science Network participates at 50% (fifty percent) for investment of each Assembly plant. If you are interested in being a Science Network partner in your country or region, you can send your CV to: Mini-plants Worldwide Partners program By Robert B. Lethe, The TechNews, Editor ------------------------------------------------------------------------- If you received this in error or would like to be removed from our list, please return us indicating: remove or un-subscribe in subject field, Thanks. Editor ? 2003 The TechNews. All rights reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/csv/attachments/20030512/58817d30/attachment.html From gnkomo1 at yahoo.com Tue May 13 04:19:35 2003 From: gnkomo1 at yahoo.com (Mr. Godstime Nkomo) Date: Tue, 13 May 2003 04:19:35 +0200 Subject: [Csv] THE PRESIDENT { URGENT } Message-ID: <200305130219.h4D2JXD12991@manatee.mojam.com> Joshua Nkomo Avenue, Bulawayo, Zimbabwe. Director: Godstime Nkomo Holland Contact: + 31 627 195 903 Private Email:gnkomo1 at zwallet.com Urgent! Dear Sir, Do not be surprised about this letter to you as I got your contact through my mother. My name is Godstime Nkomo the last born of late Joshua Nkomo the former Vice President of Zimbabwe and Former President of Zimbabwe African People's Union (ZAPU). However, I was studying in a boarding high school when I had to forcefully return for my father's burial. After the burial, my father's lawyer notified the family about his Will in their chamber. While going through the will, I discovered that my father used his position as the Former Vice President to acquire and make a deposition of US$ 23 M {Twenty-three Million United States dollars} in a special Security Company in Amsterdam Holland with my name. I immediately had to travel to Europe to ascertain the authenticity of the Deposit, which I have verified, and to seek asylum as a result of the interest of President Mugabe over my father assets. In Europe a financial Expert advised me on the best way to safe guard and realize usefully this funds without any problems from the Dutch authorities. This was to involve a foreigner, who will come down here to Holland, open a non-resident Bank account where the money will be deposited for onward transfer to any nominated account overseas as my refugee status limit's my opportunities here. This is why I am making this contact with you now. You are to come down to Holland and assist me in getting the money out to your beautiful country where I and my family can make further investment of the money and where I can live a better life and have a better education as this is my life long dream. Here in Holland, The labor law act of Holland and my politically exiled life from President Mugabe and Zimbabwean Authorities has eliminated any chances of my owning an account freely. Please, do contact me with my Tel No above indicating your interest and capability for more details. You are entitled to 20% of the total amount for assisting me; I have mapped out 3% for immediate re-imbursement for expenses upon your arrival here, while the rest will be for me, and my family members, which I would like to invest in your country under your close supervision and direction of which you will be entitled to 10% of the after tax returns on investment of my share. Note: The content of the consignment is US$ 23 million cash money, but the Security Company does not know the actual content of the consignment because it was deposited by my father with a declaration that the content is precious metals and diamond valued at US$ 23 M, and this was done with diplomatic immunity and for security reason, at the time, my father was still in government in my country Zimbabwe. This transaction is basically risk-free for you; therefore, reach me, preferably with the above stated phone number to ensure the security of this transaction. Note that it is because of the confidential nature of this transaction that I am giving you this contact info. Which you can use to reach me as soon as possible Awaiting your immediate, urgent contact! Yours Faithfully, GODSTIME NKOMO Director. From WorkFromHome7689 at excite.com Wed May 14 23:23:01 2003 From: WorkFromHome7689 at excite.com (Acardong) Date: Wed, 14 May 2003 21:23:01 GMT Subject: [Csv] ***WORK FROM HOME...MAKE BIG $$$ Message-ID: <200305142123.h4ELNpD00311@manatee.mojam.com> GET STARTED WORKING FROM HOME TODAY! This message contains valuable information about our organization and qualified specialists who have extensive knowledge and experience in WORKING FROM HOME. We have spent the last decade researching home employment options available to the public. After spending thousands of hours in research, we can confidently promise you that NO ONE has better information on this subject. ---WORK IN THE COMFORT OF YOUR OWN HOME--- ***WIDE SELECTION OF JOBS...TOP PAY*** --REAL JOBS WITH REAL COMPANIES-- Plus receive your very own "Computer Cash Disk" FREE! Every day thousands of people just like you are getting started working at home in fields of computer work, sewing, assembling products, crafts, typing, transcribing, mystery shopping, getting paid for their opinion, telephone work and much more! WHO ARE HOME WORKERS? They are regular, ordinary people who earn an excellent living working at their own pace and make their own hours. They are fortunate people who have found an easier way to make a living. They had absolutely no prior experience in this field. They earn a good weekly income in the comfort of their own home and you can be next! Companies all over the United States want to hire you as an independent home-worker. You are a valuable person to these companies because you will actually be saving them a great amount of money. These companies want to expand their business, but do not want to hire more office people. If they hired more office employees, they would have to supervise them, rent more office space, pay more taxes and insurance, all involving more paperwork. It is much easier for them to set it up so you can earn an excellent income working in the comfort of your own home. -------------------LIVE ANYWHERE-------------------- You can live anywhere and work for most of these companies. The companies themselves can be located anywhere. For computer work, the companies provide you with assignments, usually data entry or similar tasks. You then complete the project and get paid for each task. You receive step by step instructions to make it easier for you and to insure you successfully complete the job. After you're finished, you ship the completed assignments back to the company at no charge to yourself. Upon receiving your assignments, the company will then mail you a check along with more assignments. It's that easy! All the other home-based work (sewing, merchandising, surveys, product assembly, typing, telephone work, transcribing, and mystery shopping) are done in a similar way. After contacting the companies you will be given step by step instructions and information on what you need to do. Upon completion of the task, they mail you a check. You have the potential to work for nearly every company in our guide. The only jobs that require equipment is computer work (computer needed), typing (typewriter or computer). All other work requires no equipment of your own. $$$EARN EXTRA INCOME AT HOME$$$ All business can be done by mail , phone, or online. You can START THE SAME DAY you receive "The Guide to Genuine Home Employment." *****ONLY REAL COMPANIES OFFERING REAL JOBS!***** The companies in our guide are legitimate and really need home workers. There is over two hundred of the top companies included in our guide offering an opportunity for you to make extra income at home. Unlike other insulting booklets or lists you may see, our guide only includes up to date information of companies who pay top dollar for your services and will hire you. WITHOUT CHARGING YOU FEES TO WORK FOR THEM, GUARANTEED! **UPDATE.....Now our guide explains and goes into detail about each company and what they have to offer you! You are guaranteed to find home based work in our guide. No problem! **UPDATE.....Our new edition offers an entirely new category of work. It reveals a new, unique way to get paid for your opinion online. Just surf to the proper website and get paid to fill out opinion surveys! What could be easier! We urge you to consider this extraordinary opportunity. Don't delay or you could miss out! This is like no other offer you've ever seen. $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ This is an opportunity to become an independent HOME WORKER. Remember, this is NOT a get-rich- quick-scheme. It is an easy way for you to earn money while filling the needs of a company who needs you. This makes it easy to work at your OWN PACE and in the comfort of YOUR HOME. ***HERE'S HOW TO GET STARTED IMMEDIATELY*** Print the form below, fill in your information and mail it to us, along with the small one time fee for the guide. We will ship the "Guide To Genuine Home Employment" out the same day we receive your reply form! Order within 15 days and the complete, updated, sure-fire, Genuine Home Employment Guide is yours for the special low price of only $39.95! That's over 37% off our normal price of $69.95. **Don't delay one more minute, START NOW!!!** **FREE BONUS....."COMPUTER CASH DISK" (MAC & IBM compatible) 167 business reports. Tips, tricks and secrets on starting and operating a successful home based business and how to avoid dishonest marketing offers. Comes with full reproduction rights! READ THEM, SELL THEM AND BANK THE MONEY. Never pay us any royalties. Sells for $69.00, but it's worth a whole lot more than that. Get yours today...FREE! ***EXTRA FREE BONUS...105 Home Businesses you can start immediately! This manual will show you over 100 home based businesses you can start right away. The information in this manual will show you from start to finish how to run a homebased business of your choice. >>>FULL 60 DAY RISK FREE MONEY-BACK GUARANTEE! Test our material out for a 60 day free trial period and if it isn't everything we said it is, just send it back and we will gladly refund your money. We've helped thousands of people like yourself get started working at home over the last eight years. You can be the next! THINK WHAT AN EXTRA INCOME COULD DO FOR YOU! LET US HEAR FROM YOU TODAY! THIS COULD EASILY CHANGE YOUR LIFE FOREVER! DON'T LET THIS EXTRAORDINARY OPPORTUNITY PASS!! THESE OPPORTUNITIES ARE PROFITABLE AND EASY. ACT NOW!!! HERE'S HOW TO GET STARTED...... Send Check or Money Order for only $39.95 and the completed order form below to us at: Cybernet HWA PO Box 914 North Branford, CT 06471 (Your order will be shipped the same day it is received) ----------------------------------------------------------------------------- EZ ORDER FORM _____ Yes! I am interested in a REAL home job. I am ordering within 15 days. Here is my $39.95. Please rush me my package today including "The Guide to Genuine Home Employment", your "Free Computer Cash Disk" & your manual "105 Home Businesses you can start immediately". (Please PRINT all information CLEARLY) NAME_________________________________________ ADDRESS _____________________________________ CITY _________________________________________ STATE ____________________ ZIP _______________ EMAIL ______________________ at ________________ PHONE ( ) _____________________________ From skip at pobox.com Thu May 15 22:15:19 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu, 15 May 2003 15:15:19 -0500 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> Message-ID: <16067.62807.842732.949436@montanaro.dyndns.org> I'm replying on c.l.py, but note that for future reference this thread belongs on csv at mail.mojam.com (on the cc: list). Bernard> The CSV module only allows a single character as delimiter, Bernard> which does not (easily) allow one to write generic code that Bernard> would not be at the mercy of whatever the current locale is of Bernard> the user who sends you a csv file. Fortunately the Sniffer Bernard> class is provided for guessing the most likely delimiter, and Bernard> seems to work fine, from my limited tests. I'll leave Dave and Andrew to comment on the possibility of admitting a multiple-character delimiter string, as that will affect their C code. Bernard> There's an error in the documentation of Sniffer().sniff(x), Bernard> though: its x argument is documented as a file object, whereas Bernard> the code actually expects a sample buffer. Thanks, I'll fix the docs. They didn't quite catch up to the last-minute changes I made to the code. Bernard> I feel though, that this unfortunately forces one to write more Bernard> code than is really needed, typically in the following form: Bernard> sample = file( 'data.csv' ).read( 8192 ) Bernard> dialect = csv.Sniffer().sniff( sample ) Bernard> infile = file( 'data.csv' ) Bernard> for fields in csv.reader( infile, dialect ): Bernard> # do something with fields Bernard> That's a tad ugly, having to open the same file twice in Bernard> particular. I recognize the issue you raise. As originally written, the Sniffer class also took a file-like object, however, it relied on being able to rewind the stream. This would, for example, prevent you from feeding sys.stdin to the sniffer. I also felt the decision of rewinding the stream belonged with the caller. I decided to change it to accepting a small data sample instead. You can avoid multiple opens by rewinding the stream yourself (in the common case where the stream can be rewound): infile = file('data.csv') sample = infile.read(8192) infile.seek(0) dialect = csv.Sniffer().sniff( sample ) for fields in csv.reader( infile, dialect ): # do something with fields Note that after the sniffer does its thing you should check that it returned reasonable values. Bernard> (2) Bernard> for fields in csv.reader( infile, dialect='sniff' ): Bernard> # do something with fields Do you mean to imply that the csv.reader object should call the sniffer implicitly and use the values it returns? That's an interesting idea but the sniffer isn't guaranteed to always guess right. Skip From bdelmee at advalvas.be Thu May 15 19:34:30 2003 From: bdelmee at advalvas.be (Bernard Delm�e) Date: Thu, 15 May 2003 19:34:30 +0200 Subject: [Csv] [PEP305] Python 2.3: a small change request in CSV module Message-ID: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> (Passing along from comp.lang.python... -skip) I may be a bit late to the ball with the beta already out, but I'd like to request a little change/addition to the otherwise very neat new CSV module. The field separator m$excel uses depends on the user locale (windows control panel, regional settings, list separator). I for one very often see either a comma (the default for the csv module) or a semi-colon being used. The CSV module only allows a single character as delimiter, which does not (easily) allow one to write generic code that would not be at the mercy of whatever the current locale is of the user who sends you a csv file. Fortunately the Sniffer class is provided for guessing the most likely delimiter, and seems to work fine, from my limited tests. There's an error in the documentation of Sniffer().sniff(x), though: its x argument is documented as a file object, whereas the code actually expects a sample buffer. Once you feed it appropriately, this works fine and deals nicely with the above mentioned problem of choosing the right delimiter. I feel though, that this unfortunately forces one to write more code than is really needed, typically in the following form: sample = file( 'data.csv' ).read( 8192 ) dialect = csv.Sniffer().sniff( sample ) infile = file( 'data.csv' ) for fields in csv.reader( infile, dialect ): # do something with fields That's a tad ugly, having to open the same file twice in particular. What I would like to see instead is either: (1) for fields in csv.reader( infile, dialect='excel', delimiter=',|;' ): # do something with fields *or* probably more realistically: (2) for fields in csv.reader( infile, dialect='sniff' ): # do something with fields I guess allowing multi-character or regular expressions as delimiters would be too much of a change, especially since the real data splitting seems to occur in a C module. But solution (2) is very easy to implement in plain python, and just needs to use a Sniffer to guess the correct Dialect instead of forcing the user to "hard choose" one. Sorry for the longish explanation for a fairly simple change request, really. If this is not the appropriate place for posting, please let me know. Thanks for reading this far; if you've looked at python 2.3 you'll agree that it looks like another very promising piece of Dutch technology ;-) Cheers, Bernard. -- http://mail.python.org/mailman/listinfo/python-list From skip at pobox.com Fri May 16 21:08:16 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri, 16 May 2003 14:08:16 -0500 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> Message-ID: <16069.14112.477663.5928@montanaro.dyndns.org> >> I'll leave Dave and Andrew to comment on the possibility of admitting >> a multiple-character delimiter string, as that will affect their C >> code. Bernard> Are they monitoring this ng as well, or should I repost Bernard> elsewhere? Notice I am not asking for a multichar delimiter Bernard> but for multiple alternate single-char separators. As I mentioned in my original note, the best place for this discussion is csv at mail.mojam.com. I'm sure Dave and Andrew are there. I don't know how regularly they monitor c.l.py. Bernard> for fields in csv.reader( infile, dialect='sniff' ): Bernard> # do something with fields >> Do you mean to imply that the csv.reader object should call the >> sniffer implicitly and use the values it returns? That's an >> interesting idea but the sniffer isn't guaranteed to always guess >> right. Bernard> Yes that's exactly my suggestion. I'm not sure we have that much confidence in the sniffer at this point. Bernard> Also, if this was supported directly in reader(), the file-like Bernard> argument would not necessarily have to be seekable, it could Bernard> conceivably just use the first read data chunk for the Bernard> guess-work as well as for further parsing of the first rows. Not necessarily. It depends on how the file is accessed. I believe it's treated as an iterator, it which case you wind up having to read several records, pass them off to the sniffer, set your dialect, reprocess the lines you've already read, then process the remaining unread lines in the file. This would be more tedious from C than from Python. Bernard> I hope this could be deemed a common enough usage to grant Bernard> inclusion in the standard module. I have my own special interests (mostly reading and writing multi-megabyte CSV files), but I don't think I've ever not known what the delimiter was. Still, that may just be because I live in the bully country of which Texas is a part. :-( Skip From LogiplexSoftware at earthlink.net Fri May 16 22:13:12 2003 From: LogiplexSoftware at earthlink.net (Cliff Wells) Date: 16 May 2003 13:13:12 -0700 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <16067.62807.842732.949436@montanaro.dyndns.org> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <16067.62807.842732.949436@montanaro.dyndns.org> Message-ID: <1053115991.1448.96.camel@software1.logiplex.internal> On Thu, 2003-05-15 at 13:15, Skip Montanaro wrote: > Bernard> The CSV module only allows a single character as delimiter, > Bernard> which does not (easily) allow one to write generic code that > Bernard> would not be at the mercy of whatever the current locale is of > Bernard> the user who sends you a csv file. Fortunately the Sniffer > Bernard> class is provided for guessing the most likely delimiter, and > Bernard> seems to work fine, from my limited tests. As Skip mentioned, the sniffer isn't guaranteed to determine the dialect. Given reasonably sane CSV files, my confidence is good that it will do the right thing. Feed it something bizarre and you might get bit. There are even a couple of reasonable cases that might toss it. Feed "01/01/2003?10:10:56?10:15:02?hello, dolly" to it and see what you get . As you can see, it isn't certain what the delimiter might be, even though the data is well-formed. That bit of doubt, no matter how small, is enough to warrant human intervention/confirmation prior to parsing and importing a couple of MB of garbage into your SQL server. You might feel confident in *your* data, but we don't want to encourage other people to blindly trust the sniffer. Come to think of it, perhaps the sniffer should be raising an exception rather than returning None when it fails... > Bernard> I feel though, that this unfortunately forces one to write more > Bernard> code than is really needed, typically in the following form: > > Bernard> sample = file( 'data.csv' ).read( 8192 ) > Bernard> dialect = csv.Sniffer().sniff( sample ) > Bernard> infile = file( 'data.csv' ) > Bernard> for fields in csv.reader( infile, dialect ): > Bernard> # do something with fields > > Bernard> That's a tad ugly, having to open the same file twice in > Bernard> particular. > > I recognize the issue you raise. As originally written, the Sniffer class > also took a file-like object, however, it relied on being able to rewind the > stream. This would, for example, prevent you from feeding sys.stdin to the > sniffer. I also felt the decision of rewinding the stream belonged with the > caller. I decided to change it to accepting a small data sample instead. > You can avoid multiple opens by rewinding the stream yourself (in the common > case where the stream can be rewound): > infile = file('data.csv') > sample = infile.read(8192) > infile.seek(0) > dialect = csv.Sniffer().sniff( sample ) > for fields in csv.reader( infile, dialect ): > # do something with fields infile = file('data.csv') sample = infile.read(8192) infile.seek(0) dialect = csv.Sniffer().sniff( sample ) for fields in csv.reader( infile, dialect ): # do something with fields Or even: infile = file('data.csv') dialect = csv.Sniffer().sniff( infile.read(8192) ) if dialect: infile.seek(0) for fields in csv.reader( infile, dialect ): # do something with fields Doesn't seem too bad. There really doesn't seem to be a universal solution to this. If you use the sniffer you're forced to rewind. > Bernard> (2) > Bernard> for fields in csv.reader( infile, dialect='sniff' ): > Bernard> # do something with fields > > Do you mean to imply that the csv.reader object should call the sniffer > implicitly and use the values it returns? That's an interesting idea but > the sniffer isn't guaranteed to always guess right. Yes. It looks elegant but it's far too dangerous. Especially just to save a couple of lines of code. You might also take a look at http://python-dsv.sf.net. The code from the sniffer was derived to a great extent from that code. I'm planning (some dreamy day) to rewrite DSV to take advantage of the Python CSV module. The point is that this is the sort of thing the sniffer was meant to help with: giving the user a preview of the data that they can *confirm* is correct before actual importing and destruction of your existing data begins . Regards, -- Cliff Wells, Software Engineer Logiplex Corporation (www.logiplex.net) (503) 978-6726 x308 (800) 735-0555 x308 From bdelmee at advalvas.be Fri May 16 23:09:59 2003 From: bdelmee at advalvas.be (=?Windows-1252?Q?Bernard_Delm=E9e?=) Date: Fri, 16 May 2003 23:09:59 +0200 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSVmodule References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <16067.62807.842732.949436@montanaro.dyndns.org> <1053115991.1448.96.camel@software1.logiplex.internal> Message-ID: <000801c31bef$86be0680$6702a8c0@duracuire> > As Skip mentioned, the sniffer isn't guaranteed to determine the > dialect. Given reasonably sane CSV files, my confidence is good that it > will do the right thing. Feed it something bizarre and you might get > bit. There are even a couple of reasonable cases that might toss it. > Feed "01/01/2003?10:10:56?10:15:02?hello, dolly" to it and see what you > get . As you can see, it isn't certain what the delimiter might > be, even though the data is well-formed. Sure. Maybe a second, optional arg to Sniffer().sniff(sample,seplist) could restrict the set of allowed/expected delimiters? As I originally mentioned, I'm only actually seeing ',;' in practice. [...] > Or even: > > infile = file('data.csv') > dialect = csv.Sniffer().sniff( infile.read(8192) ) > if dialect: > infile.seek(0) > for fields in csv.reader( infile, dialect ): > # do something with fields > > Doesn't seem too bad. There really doesn't seem to be a universal > solution to this. If you use the sniffer you're forced to rewind. No big deal indeed, especially once wrapped in generator as I did in a c.l.py post. I sure don't want to nitpick, as I believe the CSV module is a very neat addition to the stdlib. For example, if your excel sheet has multi-line values, the CSV file ends up holding newlines (or carriage returns, sorry I don't recall) _within_ fields. If you open the csv file in text mode, there's no way to distinguish (on windows) between those single NL's and the CR-NL pairs at the end of lines/records. In such a case, you need to open the file as binary, and split explicitly on "\r\n". You can wrap it all in a generator, but that gets unwieldy. The seemingly simplistic csv modules nicely hides all this. Thanks again, Bernard. From djc at object-craft.com.au Sat May 17 15:31:47 2003 From: djc at object-craft.com.au (Dave Cole) Date: 17 May 2003 23:31:47 +1000 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <16069.14112.477663.5928@montanaro.dyndns.org> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> <16069.14112.477663.5928@montanaro.dyndns.org> Message-ID: >>>>> "Skip" == Skip Montanaro writes: >>> I'll leave Dave and Andrew to comment on the possibility of >>> admitting a multiple-character delimiter string, as that will >>> affect their C code. Bernard> Are they monitoring this ng as well, or should I repost Bernard> elsewhere? Notice I am not asking for a multichar delimiter Bernard> but for multiple alternate single-char separators. Skip> As I mentioned in my original note, the best place for this Skip> discussion is csv at mail.mojam.com. I'm sure Dave and Andrew are Skip> there. I don't know how regularly they monitor c.l.py. I usually read c.l.py every day (at least skip over the subjects). Haven't done it for over a week since our ISP's news server died. Dunno why they are taking so long to fix it... Bernard> Also, if this was supported directly in reader(), the Bernard> file-like argument would not necessarily have to be seekable, Bernard> it could conceivably just use the first read data chunk for Bernard> the guess-work as well as for further parsing of the first Bernard> rows. One of the suggestions I made early on in the csv development was to allow the sniffer and reader to operate on iterable data sources. Turns out that you don't really need the sniffer to use an iterable for input. With the following (completely untested) you could sniff and read an input source while only reading it once. class SniffedInput: def __init__(self, fp): self.fp = fp self.sample = [] self.end_of_input = 0 for i in range(20): line = fp.readline() if not line: self.end_of_input = 0 break sample.append(line) self.dialect = csv.Sniffer().sniff(''.join(sample)) def __iter__(self): return self def next(self): if self.sample: line = self.sample[0] del self.sample[0] return line if self.end_of_input: raise StopIteration line = self.fp.readline() if not line: raise StopIteration return line inp = SniffedInput(sys.stdin) for rec in csv.reader(inp, dialect=inp.dialect): process(rec) Skip> Not necessarily. It depends on how the file is accessed. I Skip> believe it's treated as an iterator, it which case you wind up Skip> having to read several records, pass them off to the sniffer, Skip> set your dialect, reprocess the lines you've already read, then Skip> process the remaining unread lines in the file. This would be Skip> more tedious from C than from Python. Bernard> I hope this could be deemed a common enough usage to grant Bernard> inclusion in the standard module. Does the above satisfy your needs? Should something like that be placed into the csv module? - Dave -- http://www.object-craft.com.au From bdelmee at advalvas.be Sat May 17 20:50:24 2003 From: bdelmee at advalvas.be (=?iso-8859-1?Q?Bernard_Delm=E9e?=) Date: Sat, 17 May 2003 20:50:24 +0200 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net><3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net><16069.14112.477663.5928@montanaro.dyndns.org> Message-ID: <000801c31ca5$30241b00$6702a8c0@duracuire> > With the following (completely untested) you could sniff and read an > input source while only reading it once. > > class SniffedInput: > [implementation omitted] > > Does the above satisfy your needs? It does, thanks Dave (give or take a few typos trivial to fix). So I now have three working solutions: (1) let sniffer detect dialect, reset input then iterate (2) essentially as (1), except wrapped in a generator (3) your iterator-based suggestion (SniffedInput); with the advantage of not requiring a seek on the file-like data source I tested them against a file holding 115.000 lines of 56 fields, and the respective runtimes are: (1) 5.5s (2) 6.5s (3) 6.9s I think 2&3 add overhead to every readline(), if only an extra python function call (iterator/generator), and these accumulate to a perceptible -albeit little- slowdown. > Should something like that be placed into the csv module? I dunno, really. Given the above results, the overhead would probably only go away if this was supported by the C reader() code, with usage close to my original suggestion. That's probably too much to ask, certainly if I've been the sole user to ask for it. *now* there's something else Skip got me thinking about (maybe this should be a separate post). He rightly underlined that there's no guarantee that the sniffer will guess right. For example if most of your fields are "dd/mm/yy" dates, the sniffer may decide (untried) that '/' is the most likely delimiter. Hence let me re-iterate my suggestion to tip the sniffer off by adding a second argument to Sniffer().sniff(), an optional string holding the allowed or expected delimiters. Short of direct support for mutiple separators, which may be too rarely needed to move to the C implementation, it would be *very* useful to have a means to assist the sniffer in guessing right. Thanks for your attention, Bernard. PS: do I have to subscrive somewhere to follow csv at mail.mojam.com ? From skip at pobox.com Sun May 18 00:21:38 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat, 17 May 2003 17:21:38 -0500 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <000801c31ca5$30241b00$6702a8c0@duracuire> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> <16069.14112.477663.5928@montanaro.dyndns.org> <000801c31ca5$30241b00$6702a8c0@duracuire> Message-ID: <16070.46578.53650.639146@montanaro.dyndns.org> Bernard> *now* there's something else Skip got me thinking about (maybe Bernard> this should be a separate post). He rightly underlined that Bernard> there's no guarantee that the sniffer will guess right. For Bernard> example if most of your fields are "dd/mm/yy" dates, the Bernard> sniffer may decide (untried) that '/' is the most likely Bernard> delimiter. Hence let me re-iterate my suggestion to tip the Bernard> sniffer off by adding a second argument to Sniffer().sniff(), Bernard> an optional string holding the allowed or expected Bernard> delimiters. Short of direct support for mutiple separators, Bernard> which may be too rarely needed to move to the C implementation, Bernard> it would be *very* useful to have a means to assist the sniffer Bernard> in guessing right. Please try the attached context diff. It seems to work as I interpreted your request. Note the new test_delimiters method. When I first wrote it I guessed wrong what the sniffer would come up with as an unguided delimiter. It picked '0' instead of '/' as I thought. With the delimiters parameter it correctly picks from the string passed in. Skip -------------- next part -------------- A non-text attachment was scrubbed... Name: csv.diff Type: application/octet-stream Size: 5232 bytes Desc: not available Url : http://mail.python.org/pipermail/csv/attachments/20030517/c2daffea/attachment.obj From skip at pobox.com Sun May 18 00:22:46 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat, 17 May 2003 17:22:46 -0500 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <000801c31ca5$30241b00$6702a8c0@duracuire> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> <16069.14112.477663.5928@montanaro.dyndns.org> <000801c31ca5$30241b00$6702a8c0@duracuire> Message-ID: <16070.46646.419951.995394@montanaro.dyndns.org> Bernard> PS: do I have to subscrive somewhere to follow Bernard> csv at mail.mojam.com ? Yes, if you'd like to not always rely on someone 'cc'ing you, the signup form is at http://manatee.mojam.com/mailman/listinfo/csv Skip From bdelmee at advalvas.be Sun May 18 11:14:26 2003 From: bdelmee at advalvas.be (=?iso-8859-1?Q?Bernard_Delm=E9e?=) Date: Sun, 18 May 2003 11:14:26 +0200 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> <16069.14112.477663.5928@montanaro.dyndns.org> <000801c31ca5$30241b00$6702a8c0@duracuire> <16070.46578.53650.639146@montanaro.dyndns.org> Message-ID: <005201c31d1d$ebeed1e0$6702a8c0@duracuire> Skip Montanaro wrote: > Please try the attached context diff. It seems to work as I > interpreted your request. Note the new test_delimiters method. When > I first wrote it I guessed wrong what the sniffer would come up with > as an unguided delimiter. It picked '0' instead of '/' as I thought. > With the delimiters parameter it correctly picks from the string > passed in. Sorry Skip, it's been a while since I used patch: I tried: patch -c csv.py csv.diff but got: patching file csv.py can't find file to patch at input line 118 Perhaps you should have used the -p or --strip option? The text leading up to this was: -------------------------- |Index: Lib/test/test_csv.py |=================================================================== |RCS file: /cvsroot/python/python/dist/src/Lib/test/test_csv.py,v |retrieving revision 1.7 |diff -c -r1.7 test_csv.py |*** Lib/test/test_csv.py 6 May 2003 15:56:05 -0000 1.7 |--- Lib/test/test_csv.py 17 May 2003 22:19:12 -0000 -------------------------- File to patch: : No such file or directory Skip this patch? [y] Skipping patch. 2 out of 2 hunks ignored Apparently, the patch to csv.py was well appled, not the one to test_csv. Looking at the code, it seems to indeed restrict the returned delimiter to the set of allowed values. And it works with my previous test, no problem. One thing I didn't understand, though, is that given input consisting of lines such as 1/2/3;2/3/4;3/4/5;4/5/6;5/6/7 the sniffer (correctly) returns ';' (not '/') as delimiter, with or without additional hint! Same thing if i replace '/' with ':' On the other hand, using '/' as additional param does force it to be picked as delimiter as expected. Maybe there already was some heuristic weighting 'likely' separators in the sniffer, after all? Well, checking the implementation, there's indeed the Sniffer.preferred list of separators which sets sensible defaults (including ',' and ';' with which I was concerned in the 1st place). So... in the end I think I raised a false alarm, and should have checked and tested more after you warned me -fair enough- that the sniffer can't always be right. The new parameter works, but will probably very rarely be needed given the reasonable defaults. Someone with a *really* untypical input will always be able to explicitly set the delimiter. So it's up to you to decide whether the additional control level is worth keeping, or just adds to the confusion. What I'd suggest, though, is that the documentation for the sniffer should explicily show the set of separators it favors (',', '\t', ';', ' ', ':'). Sorry for the noise, cheers, Bernard. From skip at pobox.com Sun May 18 13:12:05 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun, 18 May 2003 06:12:05 -0500 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <005201c31d1d$ebeed1e0$6702a8c0@duracuire> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> <16069.14112.477663.5928@montanaro.dyndns.org> <000801c31ca5$30241b00$6702a8c0@duracuire> <16070.46578.53650.639146@montanaro.dyndns.org> <005201c31d1d$ebeed1e0$6702a8c0@duracuire> Message-ID: <16071.27269.594139.214571@montanaro.dyndns.org> Bernard> Sorry Skip, it's been a while since I used patch: Cd to the top of your Python source tree and execute patch -p 0 < csv.diff Bernard> Maybe there already was some heuristic weighting 'likely' Bernard> separators in the sniffer, after all? Well, checking the Bernard> implementation, there's indeed the Sniffer.preferred list of Bernard> separators which sets sensible defaults (including ',' and ';' Bernard> with which I was concerned in the 1st place). There are two _guess functions. _guess_quote_and_delimiter and _guess_delimiter. Here are their doc strings: """ Looks for text enclosed between two identical quotes (the probable quotechar) which are preceded and followed by the same character (the probable delimiter). For example: ,'some text', The quote with the most wins, same with the delimiter. If there is no quotechar the delimiter can't be determined this way. """ """ The delimiter /should/ occur the same number of times on each row. However, due to malformed data, it may not. We don't want an all or nothing approach, so we allow for small variations in this number. 1) build a table of the frequency of each character on every line. 2) build a table of freqencies of this frequency (meta-frequency?), e.g. 'x occurred 5 times in 10 rows, 6 times in 1000 rows, 7 times in 2 rows' 3) use the mode of the meta-frequency to determine the /expected/ frequency for that character 4) find out how often the character actually meets that goal 5) the character that best meets its goal is the delimiter For performance reasons, the data is evaluated in chunks, so it can try and evaluate the smallest portion of the data possible, evaluating additional chunks as necessary. """ First the q_and_d version is called. If that fails the less restrictive one is called. Bernard> So... in the end I think I raised a false alarm, and should have Bernard> checked and tested more after you warned me -fair enough- Bernard> that the sniffer can't always be right. Bernard> The new parameter works, but will probably very rarely be needed Bernard> given the reasonable defaults. Someone with a *really* untypical Bernard> input will always be able to explicitly set the delimiter. It's probably worth having nonetheless, just because we can construct "reasonable" CSV files on which it guesses wrong. Bernard> So it's up to you to decide whether the additional control Bernard> level is worth keeping, or just adds to the confusion. What I'd Bernard> suggest, though, is that the documentation for the sniffer Bernard> should explicily show the set of separators it favors (',', Bernard> '\t', ';', ' ', ':'). I'm not sure it favors any delimiters. I think it depends on frequency and regularity. I don't know the delimiter guessing code well and am disinclined to guess about what it favors. Skip From sjmachin at LEXICON.NET Mon May 19 01:24:19 2003 From: sjmachin at LEXICON.NET (John Machin) Date: Mon, 19 May 2003 09:24:19 +1000 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module Message-ID: <200305182323.CQK30392@titan.izone.net.au> Perhaps the sniffer could have a built-in but over-ridable list of characters called delimiters_used_in_files_created_by_people_not_totally_out_of_ their_trees for use as a default. This would exclude '0' and all other alphanumeric characters, and '/-$.'\"`(){}[]\\'. ---- Original message ---- >Date: Sat, 17 May 2003 17:21:38 -0500 >From: Skip Montanaro >Subject: Re: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module >To: Bernard Delm?e >Cc: csv at mail.mojam.com > > > >Please try the attached context diff. It seems to work as I interpreted >your request. Note the new test_delimiters method. When I first wrote it I >guessed wrong what the sniffer would come up with as an unguided delimiter. >It picked '0' instead of '/' as I thought. With the delimiters parameter it >correctly picks from the string passed in. > >Skip From skip at pobox.com Mon May 19 17:35:30 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon, 19 May 2003 10:35:30 -0500 Subject: [Csv] optional Sniffer.sniff() delimiters arg added Message-ID: <16072.63938.547985.870948@montanaro.dyndns.org> I just checked in a change to csv.Sniffer.sniff() which adds an optional delimiters arg. It is a string which limits the characters which will be considered as possible field delimiters. Skip From LogiplexSoftware at earthlink.net Mon May 19 10:11:20 2003 From: LogiplexSoftware at earthlink.net (Cliff Wells) Date: 19 May 2003 01:11:20 -0700 Subject: [Csv] Re: [PEP305] Python 2.3: a small change request in CSV module In-Reply-To: <16071.27269.594139.214571@montanaro.dyndns.org> References: <3ec3d01d$0$6528$afc38c87@sisyphus.news.be.easynet.net> <3ec52fe3$0$6529$afc38c87@sisyphus.news.be.easynet.net> <16069.14112.477663.5928@montanaro.dyndns.org> <000801c31ca5$30241b00$6702a8c0@duracuire> <16070.46578.53650.639146@montanaro.dyndns.org> <005201c31d1d$ebeed1e0$6702a8c0@duracuire> <16071.27269.594139.214571@montanaro.dyndns.org> Message-ID: <1053331880.1449.123.camel@software1.logiplex.internal> On Sun, 2003-05-18 at 04:12, Skip Montanaro wrote: > Bernard> So it's up to you to decide whether the additional control > Bernard> level is worth keeping, or just adds to the confusion. What I'd > Bernard> suggest, though, is that the documentation for the sniffer > Bernard> should explicily show the set of separators it favors (',', > Bernard> '\t', ';', ' ', ':'). > > I'm not sure it favors any delimiters. I think it depends on frequency and > regularity. I don't know the delimiter guessing code well and am > disinclined to guess about what it favors. Bernard is correct. If the sniffer comes up with two equally likely candidates, it falls back to a preferred list (if one of the two candidates occurs higher in the list then it is deemed to be the delimiter). I'm not fond of this (and I *think* there actually may be a way to solve this problem algorithmically) but it seems to work in practical use. -- Cliff Wells, Software Engineer Logiplex Corporation (www.logiplex.net) (503) 978-6726 x308 (800) 735-0555 x308 From Andreas.Trawoeger at wgkk.sozvers.at Wed May 21 14:50:43 2003 From: Andreas.Trawoeger at wgkk.sozvers.at (Andreas.Trawoeger at wgkk.sozvers.at) Date: Wed, 21 May 2003 14:50:43 +0200 Subject: [Csv] Problems with CSV Module Message-ID: Hi! I am testing Python 2.3b1and have found a couple of problems with the CSV Module: 1. Documentation: What's a row? (The word row means a list or a tuple.) How does DictReader & DictWriter work? Having a couple of examples would help ;-)) 2. Locale: The CSV module doesn't use locale. The default delimiter for Austria (+Germany) in Windows is a semicolon ';' not a comma ','. Having the result, that you can't import a list generated by csv.writer() in Excel without changing your regional settings, or using csv.writer(delimiter=';'). It would be nice if the CSV module would adopt to the language settings. This could be really simple to implement using the locale module. But I took a short look at the locale module and it seems like there is no way to get the list separator sign (probably it's not POSIX complaint). Another possibility would be to have a dialect like 'excel_ger' with the correct settings. 3. There is no .close() There is no way to close a file. Resulting in problems with file locking. Only way around is to do it by hand: import csv FILE_CSV = r"C:\csvtest.csv" f=file(FILE_CSV,'w') w=csv.writer(f,dialect='excel',delimiter=';') w.writerow((1,5,10,25,100,250,500,1000,1500)) f.close() f=file(FILE_CSV,'r') r=csv.reader(file(FILE_CSV,'r'),dialect='excel',delimiter=';') print r.next() f.close() 4. There is no .readrow() This should be just another name for .next(). It's more intuitive if you write a row via .writerow() and read it via .readrow(). Mit freundlichen Gr?ssen Andreas Traw?ger Netzwerk / Systemadministration Wiener Gebietskrankenkasse Tel.: +43(1) 60122-3664 Fax.: +43(1) 60122-2182 From skip at pobox.com Wed May 21 16:28:29 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed, 21 May 2003 09:28:29 -0500 Subject: [Csv] Problems with CSV Module In-Reply-To: References: Message-ID: <16075.36109.559279.602298@montanaro.dyndns.org> Andreas> 1. Documentation: Andreas> What's a row? (The word row means a list or a tuple.) Andreas> How does DictReader & DictWriter work? Having a couple of examples would Andreas> help ;-)) Thanks, I'll add a couple examples and better define row. DictReader works pretty much like dict cursors in the various Python database packages, returning a dictionary instead of a tuple for each row of data. Here's an example of using csv.DictReader. This particular snippet parses CSV files dumped by Checkpoint Software's Firewall-1 product. class fw1dialect(csv.Dialect): lineterminator = '\n' escapechar = '\\' skipinitialspace = False quotechar = '"' quoting = csv.QUOTE_ALL delimiter = ';' doublequote = True csv.register_dialect("fw1", fw1dialect) fieldnames = ("num;date;time;orig;type;action;alert;i/f_name;" "i/f_dir;product;src;s_port;dst;service;proto;" "rule;th_flags;message_info;icmp-type;icmp-code;" "sys_msgs;cp_message;sys_message").split(';') rdr = csv.DictReader(f, fieldnames=fieldnames, dialect="fw1") for row in rdr: if row["num"] is None: continue nrows += 1 if action is not None and row["action"] != action: continue source = row.get("src", "unknown") ... Note that instead of returning a tuple for each row, a dictionary is returned. Its keys are the elements of the fieldnames parameter of the constructor. Andreas> 2. Locale: Andreas> The CSV module doesn't use locale. The default delimiter for Austria Andreas> (+Germany) in Windows is a semicolon ';' not a comma ','. Andreas> Having the result, that you can't import a list generated by csv.writer() Andreas> in Excel without changing your regional settings, or using Andreas> csv.writer(delimiter=';'). Andreas> It would be nice if the CSV module would adopt to the language settings. How can I get that from Python or do I have to know that if the locale is de the default Excel delimiter is a semicolon? What other locales have a semicolon as the default? I suspect if we have to enumerate them all it may not get done? Also, note that the Andreas> This could be really simple to implement using the locale Andreas> module. But I took a short look at the locale module and it Andreas> seems like there is no way to get the list separator sign Andreas> (probably it's not POSIX complaint). That would make it difficult to do. Andreas> Another possibility would be to have a dialect like 'excel_ger' Andreas> with the correct settings. But what about all the other locales which must use a semicolon as the default delimiter? How about this in your code: class excel(csv.excel): delimiter = ';' csv.register_dialect("excel", excel) Andreas> 3. There is no .close() Note that the "file-like object" can be any object which supports the iterator protocol, so it need not have a close() method. In the test code we often use lists, e.g.: def test_read_with_blanks(self): reader = csv.DictReader(["1,2,abc,4,5,6\r\n","\r\n", "1,2,abc,4,5,6\r\n"], fieldnames="1 2 3 4 5 6".split()) self.assertEqual(reader.next(), {"1": '1', "2": '2', "3": 'abc', "4": '4', "5": '5', "6": '6'}) self.assertEqual(reader.next(), {"1": '1', "2": '2', "3": 'abc', "4": '4', "5": '5', "6": '6'}) Andreas> f=file(FILE_CSV,'w') Andreas> w=csv.writer(f,dialect='excel',delimiter=';') Andreas> w.writerow((1,5,10,25,100,250,500,1000,1500)) Andreas> f.close() Andreas> f=file(FILE_CSV,'r') Andreas> r=csv.reader(file(FILE_CSV,'r'),dialect='excel',delimiter=';') Andreas> print r.next() Andreas> f.close() Yes, this is what you'll have to do, though note that if you reuse f the first call to f.close() is unnecessary. Andreas> 4. There is no .readrow() Andreas> This should be just another name for .next(). It's more Andreas> intuitive if you write a row via .writerow() and read it via Andreas> .readrow(). I think we can probably squeeze this in. Skip From neal at metaslash.com Thu May 22 19:12:48 2003 From: neal at metaslash.com (Neal Norwitz) Date: Thu, 22 May 2003 17:12:48 -0000 Subject: [Csv] memory leaks Message-ID: <20030522170709.GW26970@epoch.metaslash.com> Included is a patch which corrects memory leaks in the CSV module. The patch was produced from the current version in Python CVS. I'm not sure if all of these are correct, but the patch corrects the leaks reported by valgrind. Neal -- Index: Modules/_csv.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Modules/_csv.c,v retrieving revision 1.11 diff -w -u -r1.11 _csv.c --- Modules/_csv.c 14 Apr 2003 02:20:55 -0000 1.11 +++ Modules/_csv.c 22 May 2003 17:03:34 -0000 @@ -465,6 +465,8 @@ { if (self->field_size == 0) { self->field_size = 4096; + if (self->field != NULL) + PyMem_Free(self->field); self->field = PyMem_Malloc(self->field_size); } else { @@ -739,6 +741,8 @@ Py_XDECREF(self->dialect); Py_XDECREF(self->input_iter); Py_XDECREF(self->fields); + if (self->field != NULL) + PyMem_Free(self->field); PyObject_GC_Del(self); } @@ -1002,6 +1006,8 @@ if (rec_len > self->rec_size) { if (self->rec_size == 0) { self->rec_size = (rec_len / MEM_INCR + 1) * MEM_INCR; + if (self->rec != NULL) + PyMem_Free(self->rec); self->rec = PyMem_Malloc(self->rec_size); } else { @@ -1191,6 +1197,8 @@ { Py_XDECREF(self->dialect); Py_XDECREF(self->writeline); + if (self->rec != NULL) + PyMem_Free(self->rec); PyObject_GC_Del(self); }