mx odbc result strings with special characters?

Jason Orendorff jason at jorendorff.com
Mon Mar 4 01:06:26 EST 2002


> Hmm, ok, can you tell me how I can convert such a string back to the
> ISO8859-1 character set so that I can write it to a text file?

It appears that it already *is* in the right character set.
Unfortunately, you seem to be converting it to a different
representation sometimes, quite unintentionally, when you
output it.

Try this and see how it works.

  import mx.ODBC.Windows as odbc

  outfile = open("myfile.txt", 'w')

  con = odbc.connect(...)
  c = con.cursor()
  c.execute(...)

  record = c.fetchone()    # record is a tuple
  my_string = c[1]
  print my_string   # print the string, not the tuple
  outfile.write(my_string + '\n')   # write the string, not the tuple

  c.close()
  con.close()
  outfile.close()

> The strange thing is, that when I read umlauts from a textfile and
> print them to standard out, they remain intact. Only when they come
> from the database driver are they converted to this hex
> representation.

It's not the database or mxODBC that's doing it.

The "hex representation" is just Python source code.
Nothing too weird about it; that's how Python programmers put
strings into programs.

Suppose you type this.

  >>> x = '123\n\txyz\n'

Now in memory there's a string that has 9 characters in it,
not 12.  Your '\n' has been converted to a single byte,
with the value 10 (a newline character).  The '\t' has been
converted to a single byte with the value 9 (a tab character).

  >>> print len(x)
  9
  >>> print ord(x[3])  # the value of the '\n' character
  10
  >>> print ord(x[4])  # the value of the '\t' character
  9

Now, suppose we print this string two different ways.

  >>> print x
  123
        xyz

  >>> print repr(x)
  '123\n\txyz\n'

See the difference?  When you do "print x", Python dumps those
characters to your console.  The newline characters, which you
entered as '\n', are now displayed as line breaks, and the tab
character, which you entered as '\t', is displayed as an
indentation.  Likewise, depending on your console, the
character you could enter as '\xd6' might be displayed as a
Latin capital letter O with diaeresis (aka umlaut).

When you do "print repr(x)", Python converts x *back* into
Python-source-code form and displays the result.

Which of these is the true internal representation of x?
Neither.  The internal representation is 9 bytes of data--
a bunch of numbers, really, not characters or pixels or
source code.

When you print a tuple, it tries to display itself in
Python-source-code format.  This means, basically, calling
repr() on its contents.  Not what you want.

In order to display a string usefully, you must print the
string itself, not the tuple that contains it.

Sorry to write at such length...  :(

## Jason Orendorff    http://www.jorendorff.com/





More information about the Python-list mailing list