Unicode / cx_Oracle problem

John Machin sjmachin at lexicon.net
Sun Sep 10 18:27:17 EDT 2006


Richard Schulman wrote:
> On Sun, 10 Sep 2006 11:42:26 +0200, "Diez B. Roggisch"
> <deets at nospam.web.de> wrote:
>
> >What does print repr(mean) give you?
>
> That is a useful suggestion.
>
> For context, I reproduce the source code:
>
> in_file = codecs.open("c:\\pythonapps\\mean.my",encoding="utf_16_LE")
> connection = cx_Oracle.connect("username", "password")
> cursor = connection.cursor()
> for row in in_file:
>     id = row[0]
>     mean = row[1]
>     print "Value of row is ", repr(row)                    #debug line
>     print "Value of the variable 'id' is ", repr(id)       #debug line
>     print "Value of the variable 'mean' is ", repr(mean)   #debug line
>     cursor.execute("""INSERT INTO mean (mean_id,mean_eng_txt)
>         VALUES (:id,:mean)""",id=id,mean=mean)
>
> Here is the result from the print repr() statements:
>
> Value of row is  u"\ufeff(3,'sadness, lament; sympathize with,
> pity')\r\n"
> Value of the variable 'id' is  u'\ufeff'
> Value of the variable 'mean' is  u'('
>
> Clearly, the values loaded into the 'id' and 'mean' variables are not
> satisfactory but are picking up the BOM.

Well of course they're "unsatisfactory" and this is absolutely nothing
to do with Oracle and cx_Oracle.

row is a string of characters. row[0] is the BOM. Read my lips (from a
previous thread):

"""
Use utf_16 -- it will strip off the BOM for you.
"""
and again:
"""
| >>> codecs.open('guff.utf16le', 'r', encoding='utf_16').read()
| u'abc\n\rdef\n\rghi' ######### Look, Mom, no BOM!
"""

row[1] is the first ***character*** of what looks suspiciously like the
Python representation of a tuple:

"""(3,'sadness, lament; sympathize with, pity')"""

Who wrote that like that??? If it is at all under your control, do it
like this:
Encode each Unicode text field in UTF-8. Write the file as a CSV file
using Python's csv module. Read the CSV file using the same module.
Decode the text fields from UTF-8.

You need to parse the incoming line into column values (the csv module
does this for you) and then convert each column value from
string/Unicode to a Python type that is compatible with the Oracle type
for that column.

My guess (not having used cx_Oracle) is that the error is happening
because the column "id" has a numeric type and you are trying to jam a
Unicode string into it. IOW, nothing to do with the "mean" column
(yet!).

BTW, I've managed to decode that "eng" means English not engineering
and "mean" means meaning i.e. not average and not stingy. Holy
obfuscation, Batman!

HTH,
John




More information about the Python-list mailing list