Unicode & mx.ODBC module

Thu Mar 4 10:57:22 EST 2004

On 2004-03-04, vincent wehren <vincent at visualtrans.de> wrote:
>
> "Chuck Bearden" <cbearden at hal-pc.org> schrieb im Newsbeitrag
> news:40466a21$0$7052$a726171b at news.hal-pc.org...
>| I'm having a tough time understanding how to manage Unicode when loading
>| data into an MS SQL server. <snipped for brevity>
>
> ...
>
>| html_f = open(sys.argv[1], 'r')
>| htmldata = html_f.read()
>| html_f.close()
>|
>| #-- make statement string and insert values tuple, and execute
>| stmnt = """
>|   INSERT INTO pmLinkHTML
>|   (PMID, Ord, HTML, HTMLlen)
>|   VALUES
>|   (?, ?, ?, ?)
>| """
>| val_t = (549, 0, htmldata, len(htmldata))
>| cur.execute(stmnt, val_t)
>|
>| cur.close()
>| con.close()
>| --------------------------end snippet--------------------------
>|
>| For my pains I am rewarded with:
>|
>|   Traceback (most recent call last):
>|     File "./unitest.py", line 27, in ?
>|       cur.execute(stmnt, val_t)
>|   UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position
>|   45662: unexpected code byte
>|
>| Byte 45662 of the HTML file is indeed "\xBE".  I don't think that should
>| be a problem.
>|
>| What am I doing wrong?
>
> What happens if you decode htmldata first by using
>
> enc = "iso-8859-1" #change to whatever the input file's encoding is
> htmldata = unicode(htmldata, enc)
>
> ?

Thanks.  That was simple.  It feels so good when you stop beating your
head against a brick wall.  After using your timp to make my 
simplified code above work, I was able to figure out how to apply it 
to my more complex real project.

I think I'm still not entirely clear on when Unicode encoding & 
decoding happen in Python and for what reasons.  In my searching on this
problem I kept my eye open for a nice, systematic treatment of Unicode
in Python, but I haven't found anything yet.

Again, many thanks for your repsonse.
Best wishes,
Chuck