Unicode & mx.ODBC module

Thu Mar 4 02:12:11 EST 2004

"Chuck Bearden" <cbearden at hal-pc.org> schrieb im Newsbeitrag
news:40466a21$0$7052$a726171b at news.hal-pc.org...
| I'm having a tough time understanding how to manage Unicode when loading
| data into an MS SQL server. <snipped for brevity>

...

| html_f = open(sys.argv[1], 'r')
| htmldata = html_f.read()
| html_f.close()
|
| #-- make statement string and insert values tuple, and execute
| stmnt = """
|   INSERT INTO pmLinkHTML
|   (PMID, Ord, HTML, HTMLlen)
|   VALUES
|   (?, ?, ?, ?)
| """
| val_t = (549, 0, htmldata, len(htmldata))
| cur.execute(stmnt, val_t)
|
| cur.close()
| con.close()
| --------------------------end snippet--------------------------
|
| For my pains I am rewarded with:
|
|   Traceback (most recent call last):
|     File "./unitest.py", line 27, in ?
|       cur.execute(stmnt, val_t)
|   UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position
|   45662: unexpected code byte
|
| Byte 45662 of the HTML file is indeed "\xBE".  I don't think that should
| be a problem.
|
| What am I doing wrong?

What happens if you decode htmldata first by using

enc = "iso-8859-1" #change to whatever the input file's encoding is
htmldata = unicode(htmldata, enc)

?

Vincent Wehren

 I have spent a fair bit of time googling the
| ng in various ways, and consulting Python in a Nutshell and the online
| standard library docs at python.org.  It may be something quite
| obvious to a better-informed coder, but I am prepared to learn.
|
| Many thanks in advance.
| Chuck Bearden
|
|