Unicode & mx.ODBC module
vincent wehren
vincent at visualtrans.de
Thu Mar 4 02:12:11 EST 2004
"Chuck Bearden" <cbearden at hal-pc.org> schrieb im Newsbeitrag
news:40466a21$0$7052$a726171b at news.hal-pc.org...
| I'm having a tough time understanding how to manage Unicode when loading
| data into an MS SQL server. <snipped for brevity>
...
| html_f = open(sys.argv[1], 'r')
| htmldata = html_f.read()
| html_f.close()
|
| #-- make statement string and insert values tuple, and execute
| stmnt = """
| INSERT INTO pmLinkHTML
| (PMID, Ord, HTML, HTMLlen)
| VALUES
| (?, ?, ?, ?)
| """
| val_t = (549, 0, htmldata, len(htmldata))
| cur.execute(stmnt, val_t)
|
| cur.close()
| con.close()
| --------------------------end snippet--------------------------
|
| For my pains I am rewarded with:
|
| Traceback (most recent call last):
| File "./unitest.py", line 27, in ?
| cur.execute(stmnt, val_t)
| UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position
| 45662: unexpected code byte
|
| Byte 45662 of the HTML file is indeed "\xBE". I don't think that should
| be a problem.
|
| What am I doing wrong?
What happens if you decode htmldata first by using
enc = "iso-8859-1" #change to whatever the input file's encoding is
htmldata = unicode(htmldata, enc)
?
Vincent Wehren
I have spent a fair bit of time googling the
| ng in various ways, and consulting Python in a Nutshell and the online
| standard library docs at python.org. It may be something quite
| obvious to a better-informed coder, but I am prepared to learn.
|
| Many thanks in advance.
| Chuck Bearden
|
|
More information about the Python-list
mailing list