unicode mystery/problem

Fri Sep 22 07:53:42 EDT 2006

John, thanks for your extensive answer.
>> Hi,
>> I am using Python 2.4.3 on Fedora Core4 and  "Eric3" Python IDE
>> .
>> Below mentioned code works fine in the Eric3 environment. While trying
>> to start it from the command line, it returns:
>>
>> Traceback (most recent call last):
>>   File "pokus_1.py", line 5, in ?
>>     print str(a)
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in
>> position 6: ordinal not in range(128)

JM> So print a works, but print str(a) crashes.

JM> Instead, insert this:
JM>    import sys
JM>    print "default", sys.getdefaultencoding()
JM>    print "stdout", sys.stdout.encoding
JM> and run your script at the command line. It should print:
JM>     default ascii
JM>     stdout x
****  in the command line it prints:  *****
default ascii
stdout UTF-8
JM> here, and crash at the later use of str(a).
JM> Step 2: run your script under Eric3. It will print:
JM>     default y
JM>     stdout z

****  in the Eric3 it prints:  ****
if the # -*- Eencoding: utf_8 -*- is set than:

default utf_8
stdout
unhandled AttributeError, "AsyncFile instance has no attribute
'encoding' "

if the encoding is not set than it prints:

DeprecationWarning: Non-ASCII character '\xc3' in file
/root/eric/analyza_dat_TPC/pokus_1.py on line 26, but no encoding
declared; see http://www.python.org/peps/pep-0263.html for details execfile(sys.argv[0], self.debugMod.__dict__)

default latin-1
stdout
unhandled AttributeError, "AsyncFile instance has no attribute
'encoding' "

JM> and then should work properly. It is probable that x == y == z ==
JM> 'utf-8'
JM> Step 3: see below.

>>
>> ========== 8< =============
>> #!/usr/bin python
>> # -*- Encoding: utf_8 -*-

JM> There is no UTF8-encoded text in this short test script. Is the above
JM> encoding comment merely a carry-over from your real script, or do you
JM> believe it is necessary or useful in this test script?
Generally, I am working with string like u'DISKOV\xc1 POLE' (I am
getting it from the database)

My intention to use >> # -*- Encoding: utf_8 -*- was to suppress
DeprecationWarnings if I use utf_8 in the code (like u'DISKOV\xc1 POLE')

>>
>> a= u'DISKOV\xc1 POLE'
>> print a
>> print str(a)
>> ========== 8< =============
>>
>> Even it looks strange, I have to use str(a) syntax even I know the "a"
>> variable is a string.

JM> Some concepts you need to understand:
JM> (a) "a" is not a string, it is a reference to a string.
JM> (b) It is a reference to a unicode object (an implementation of a
JM> conceptual Unicode string) ...
JM> (c) which must be distinguished from a str object, which represents a
JM> conceptual string of bytes.
JM> (d) str(a) is trying to produce a str object from a unicode object. Not
JM> being told what encoding to use, it uses the default encoding
JM> (typically ascii) and naturally this will crash if there are non-ascii
JM> characters in the unicode object.

>> I am trying to use ChartDirector for Python (charts for Python) and the
>> method "layer.addDataSet()" needs above mentioned syntax otherwise it
>> returns an Error.

JM> Care to tell us which error???
you can see the Error description and author comments here:
http://tinyurl.com/ezohe

>>
>> layer.addDataSet(data, colour, str(dataName))
I have try to experiment with the code a bit.
the simplest code where I can demonstrate my problems:
#!/usr/bin python
import sys
print "default", sys.getdefaultencoding()
print "stdout", sys.stdout.encoding

a=['P\xc5\x99\xc3\xad','Petr Jake\xc5\xa1']
b="my nice try %s" % ''.join(a).encode("utf-8")
print b

When I run it from the command line i am getting:
sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file pokus_1.py on line 26, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

default ascii
stdout UTF-8

Traceback (most recent call last):
  File "pokus_1.py", line 8, in ?
    b="my nice try %s" % ''.join(a).encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)

JM> The method presumably expects a str object (8-bit string). What does
JM> its documentation say? Again, what error message do you get if you feed
JM> it a unicode object with non-ascii characters?

JM> [Step 3] For foo in set(['x', 'y', 'z']):
JM>     Change str(dataName) to dataName.encode(foo). Change any debugging
JM> display to use repr(a) instead of str(a). Test it with both Eric3 and
JM> the command line.

JM> [Aside: it's entirely possible that your problem will go away if you
JM> remove the letter u from the line a= u'DISKOV\xc1 POLE' -- however if
JM> you want to understand what is happening generally, I suggest you don't
JM> do that]

JM> HTH,
JM> John