mx odbc result strings with special characters?
Alexander Jerusalem
ajeru at gmx.net
Sun Mar 3 19:02:46 EST 2002
Thanks a lot, now it works and even better I think I understand what's
going on :-) I was aware of the fact that control characters like \n are
represented in binary as a single number. What I missed is that I was
printing the repr() of a tuple object instead of a list of strings. And
this actually converts the strings back to Python source code as you point out.
Thanks,
Alexander
At 00:06 04.03.2002 -0600, Jason Orendorff wrote:
> > Hmm, ok, can you tell me how I can convert such a string back to the
> > ISO8859-1 character set so that I can write it to a text file?
>
>It appears that it already *is* in the right character set.
>Unfortunately, you seem to be converting it to a different
>representation sometimes, quite unintentionally, when you
>output it.
>
>Try this and see how it works.
>
> import mx.ODBC.Windows as odbc
>
> outfile = open("myfile.txt", 'w')
>
> con = odbc.connect(...)
> c = con.cursor()
> c.execute(...)
>
> record = c.fetchone() # record is a tuple
> my_string = c[1]
> print my_string # print the string, not the tuple
> outfile.write(my_string + '\n') # write the string, not the tuple
>
> c.close()
> con.close()
> outfile.close()
>
> > The strange thing is, that when I read umlauts from a textfile and
> > print them to standard out, they remain intact. Only when they come
> > from the database driver are they converted to this hex
> > representation.
>
>It's not the database or mxODBC that's doing it.
>
>The "hex representation" is just Python source code.
>Nothing too weird about it; that's how Python programmers put
>strings into programs.
>
>Suppose you type this.
>
> >>> x = '123\n\txyz\n'
>
>Now in memory there's a string that has 9 characters in it,
>not 12. Your '\n' has been converted to a single byte,
>with the value 10 (a newline character). The '\t' has been
>converted to a single byte with the value 9 (a tab character).
>
> >>> print len(x)
> 9
> >>> print ord(x[3]) # the value of the '\n' character
> 10
> >>> print ord(x[4]) # the value of the '\t' character
> 9
>
>Now, suppose we print this string two different ways.
>
> >>> print x
> 123
> xyz
>
> >>> print repr(x)
> '123\n\txyz\n'
>
>See the difference? When you do "print x", Python dumps those
>characters to your console. The newline characters, which you
>entered as '\n', are now displayed as line breaks, and the tab
>character, which you entered as '\t', is displayed as an
>indentation. Likewise, depending on your console, the
>character you could enter as '\xd6' might be displayed as a
>Latin capital letter O with diaeresis (aka umlaut).
>
>When you do "print repr(x)", Python converts x *back* into
>Python-source-code form and displays the result.
>
>Which of these is the true internal representation of x?
>Neither. The internal representation is 9 bytes of data--
>a bunch of numbers, really, not characters or pixels or
>source code.
>
>When you print a tuple, it tries to display itself in
>Python-source-code format. This means, basically, calling
>repr() on its contents. Not what you want.
>
>In order to display a string usefully, you must print the
>string itself, not the tuple that contains it.
>
>Sorry to write at such length... :(
>
>## Jason Orendorff http://www.jorendorff.com/
More information about the Python-list
mailing list