Unicode & mx.ODBC module
Chuck Bearden
cbearden at hal-pc.org
Thu Mar 4 10:57:22 EST 2004
On 2004-03-04, vincent wehren <vincent at visualtrans.de> wrote:
>
> "Chuck Bearden" <cbearden at hal-pc.org> schrieb im Newsbeitrag
> news:40466a21$0$7052$a726171b at news.hal-pc.org...
>| I'm having a tough time understanding how to manage Unicode when loading
>| data into an MS SQL server. <snipped for brevity>
>
> ...
>
>| html_f = open(sys.argv[1], 'r')
>| htmldata = html_f.read()
>| html_f.close()
>|
>| #-- make statement string and insert values tuple, and execute
>| stmnt = """
>| INSERT INTO pmLinkHTML
>| (PMID, Ord, HTML, HTMLlen)
>| VALUES
>| (?, ?, ?, ?)
>| """
>| val_t = (549, 0, htmldata, len(htmldata))
>| cur.execute(stmnt, val_t)
>|
>| cur.close()
>| con.close()
>| --------------------------end snippet--------------------------
>|
>| For my pains I am rewarded with:
>|
>| Traceback (most recent call last):
>| File "./unitest.py", line 27, in ?
>| cur.execute(stmnt, val_t)
>| UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position
>| 45662: unexpected code byte
>|
>| Byte 45662 of the HTML file is indeed "\xBE". I don't think that should
>| be a problem.
>|
>| What am I doing wrong?
>
> What happens if you decode htmldata first by using
>
> enc = "iso-8859-1" #change to whatever the input file's encoding is
> htmldata = unicode(htmldata, enc)
>
> ?
Thanks. That was simple. It feels so good when you stop beating your
head against a brick wall. After using your timp to make my
simplified code above work, I was able to figure out how to apply it
to my more complex real project.
I think I'm still not entirely clear on when Unicode encoding &
decoding happen in Python and for what reasons. In my searching on this
problem I kept my eye open for a nice, systematic treatment of Unicode
in Python, but I haven't found anything yet.
Again, many thanks for your repsonse.
Best wishes,
Chuck
More information about the Python-list
mailing list