mx odbc result strings with special characters?

Alexander Jerusalem ajeru at gmx.net
Sun Mar 3 19:02:46 EST 2002


Thanks a lot, now it works and even better I think I understand what's 
going on :-) I was aware of the fact that control characters like \n are 
represented in binary as a single number. What I missed is that I was 
printing the repr() of a tuple object instead of a list of strings. And 
this actually converts the strings back to Python source code as you point out.

Thanks,
Alexander



At 00:06 04.03.2002 -0600, Jason Orendorff wrote:
> > Hmm, ok, can you tell me how I can convert such a string back to the
> > ISO8859-1 character set so that I can write it to a text file?
>
>It appears that it already *is* in the right character set.
>Unfortunately, you seem to be converting it to a different
>representation sometimes, quite unintentionally, when you
>output it.
>
>Try this and see how it works.
>
>   import mx.ODBC.Windows as odbc
>
>   outfile = open("myfile.txt", 'w')
>
>   con = odbc.connect(...)
>   c = con.cursor()
>   c.execute(...)
>
>   record = c.fetchone()    # record is a tuple
>   my_string = c[1]
>   print my_string   # print the string, not the tuple
>   outfile.write(my_string + '\n')   # write the string, not the tuple
>
>   c.close()
>   con.close()
>   outfile.close()
>
> > The strange thing is, that when I read umlauts from a textfile and
> > print them to standard out, they remain intact. Only when they come
> > from the database driver are they converted to this hex
> > representation.
>
>It's not the database or mxODBC that's doing it.
>
>The "hex representation" is just Python source code.
>Nothing too weird about it; that's how Python programmers put
>strings into programs.
>
>Suppose you type this.
>
>   >>> x = '123\n\txyz\n'
>
>Now in memory there's a string that has 9 characters in it,
>not 12.  Your '\n' has been converted to a single byte,
>with the value 10 (a newline character).  The '\t' has been
>converted to a single byte with the value 9 (a tab character).
>
>   >>> print len(x)
>   9
>   >>> print ord(x[3])  # the value of the '\n' character
>   10
>   >>> print ord(x[4])  # the value of the '\t' character
>   9
>
>Now, suppose we print this string two different ways.
>
>   >>> print x
>   123
>         xyz
>
>   >>> print repr(x)
>   '123\n\txyz\n'
>
>See the difference?  When you do "print x", Python dumps those
>characters to your console.  The newline characters, which you
>entered as '\n', are now displayed as line breaks, and the tab
>character, which you entered as '\t', is displayed as an
>indentation.  Likewise, depending on your console, the
>character you could enter as '\xd6' might be displayed as a
>Latin capital letter O with diaeresis (aka umlaut).
>
>When you do "print repr(x)", Python converts x *back* into
>Python-source-code form and displays the result.
>
>Which of these is the true internal representation of x?
>Neither.  The internal representation is 9 bytes of data--
>a bunch of numbers, really, not characters or pixels or
>source code.
>
>When you print a tuple, it tries to display itself in
>Python-source-code format.  This means, basically, calling
>repr() on its contents.  Not what you want.
>
>In order to display a string usefully, you must print the
>string itself, not the tuple that contains it.
>
>Sorry to write at such length...  :(
>
>## Jason Orendorff    http://www.jorendorff.com/





More information about the Python-list mailing list