[Tutor] correctly format and insert html block using python into mysql table
Norman Khine
norman at khine.net
Tue Mar 2 15:21:53 CET 2010
hello,
I have this code:
>>> import re
>>> import MySQLdb, csv, sys
>>> conn = MySQLdb.connect (host = "localhost",user = "usr", passwd= "pass",db = "databasename")
>>> c = conn.cursor()
>>> file = open('Data/asdsp-lao-farmers-et-batieng-products.html', 'r')
>>> data = file.read()
>>> get_records = re.compile(r"""<div id=\"flexicontent\" class=\"flexicontent\">(.*)<\/div>""", re.DOTALL).findall
>>> get_titles = re.compile(r"""<h3>(.*)<\/h3>""").findall
>>> get_description = re.compile(r"""<div class=\"description\">(.*)<\/div>""", re.DOTALL).findall
>>> block_record = []
>>> block_url = []
>>> records = get_records(data)
>>> for record in records:
... description = get_description(record)
... print description # see http://paste.lisp.org/+21XF for output
... c.execute("INSERT INTO a (description) VALUES (%s)", description)
>>> c.commit()
>>> c.close()
the problem is that the 'html' comes out like:
http://paste.lisp.org/+21XF
is there a way to format the output so that it does not include the
\n\t\t and has the correct encoding?
thanks
norman
More information about the Tutor
mailing list