Need help converting text to csv format

r0g aioe.org at technicalbloke.com
Fri Nov 21 14:03:57 EST 2008


Chuck Connors wrote:
> Hey guys.  I'm working on a little program to help my wife catalog her/
> our coupons.  I found a good resource but need help formatting the
> text data so that I can import it into a mysql database. Here's the
> data format:
> 
> 409220000003 Life Fitness Products $1 (12-13-08) (CVS)
> 546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
> each
> 518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
> 518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
> Cinnamon Rolls, etc. .40 (2-14-09)
> 
> The first value (large number) is the UPC, the next element is the
> coupon description, followed by a date in parenthesis.  Those are the
> only three elements I am concerned with.  Can someone help me in
> reformatting this:
> 
> 409220000003 Life Fitness Products $1 (12-13-08) (CVS)
> 546500181141 Oust Air Sanitizer, any B1G1F up to $3.49 (1-17-09) .35
> each
> 518000159258 Pillsbury Crescent Dinner Rolls, any .25 (2-14-09)
> 518000550406 Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
> Cinnamon Rolls, etc. .40 (2-14-09)
> 
> into something like this:
> 
> "409220000003","Life Fitness Products $1","12-13-08"
> "546500181141","Oust Air Sanitizer, any B1G1F up to $3.49","1-17-09"
> "518000159258","Pillsbury Crescent Dinner Rolls, any .25","2-14-09"
> "518000550406","Pillsbury Frozen Grands Biscuits, Cinnamon Rolls, Mini
> Cinnamon Rolls, etc. .40","2-14-09"
> 
> Any help, pseudo code, or whatever push in the right direction would
> be most appreciated.  I am a novice Python programmer but I do have a
> good bit of PHP programming experience.
> 
> Thanks for your time....


Hi Chuck,

Don't be put off, not everyone here is rude and what you are asking
isn't so trivial as to justify that flaming RTFM horsecrap. You are
moving from php to python - this is cause for celebration! Welcome :-)

Anyway I can see two ways of attacking this, you could either write a
finite state machine or use regular expressions. My preference would be
regular expressions.

The following should do what you want although I haven't tested it much
and you might want to lookout for irregularities in your source text
like linebreaks etc...

import re
regex = re.compile("([0-9]{3,13}) ([^(]*) \(([^)]*)")

for each in regex.findall(source_data):
  print '"'+each[0]+'","'+each[1]+'","'+each[2]+'"'


I don't know if you already know regular expressions from some other
language so I won't bother explaining this one other than to say you may
note Pythons regex syntax is generally more liberal than many other
languages implementations. If this is all new to you and you do need
this explaining just let me know and I will happily run you through it.


Also, you say your eventual target is a SQL database. If the database is
 local (or reachable via a network) then you are probably going to be
better off using python to connect to it directly using the MySQLdb
module...

import MySQLdb

db = MySQLdb.connect(host="localhost",
                     user="uname",
                     passwd="pwd",
                     db="dbname")

cursor = db.cursor()
cursor.execute("INSERT YOU SQL QUERY HERE")
db.commit ()


Hope this helps,

Roger.



More information about the Python-list mailing list