[Tutor] correctly format and insert html block using python into mysql table

Norman Khine norman at khine.net
Tue Mar 2 15:21:53 CET 2010


hello,
I have this code:

>>> import re
>>> import MySQLdb, csv, sys
>>> conn = MySQLdb.connect (host = "localhost",user = "usr", passwd= "pass",db = "databasename")
>>> c = conn.cursor()
>>> file = open('Data/asdsp-lao-farmers-et-batieng-products.html', 'r')
>>> data = file.read()
>>> get_records = re.compile(r"""<div id=\"flexicontent\" class=\"flexicontent\">(.*)<\/div>""", re.DOTALL).findall
>>> get_titles = re.compile(r"""<h3>(.*)<\/h3>""").findall
>>> get_description = re.compile(r"""<div class=\"description\">(.*)<\/div>""", re.DOTALL).findall

>>> block_record = []
>>> block_url = []
>>> records = get_records(data)
>>> for record in records:
...	description = get_description(record)
...	print description # see http://paste.lisp.org/+21XF for output
...     c.execute("INSERT INTO a (description) VALUES (%s)", description)
>>> c.commit()
>>> c.close()

the problem is that the 'html' comes out like:

http://paste.lisp.org/+21XF

is there a way to format the output so that it does not include the
\n\t\t and has the correct encoding?

thanks
norman


More information about the Tutor mailing list