'ascii' codec can't encode character u'\xf3'

Tue Aug 17 04:21:27 EDT 2004

thank you for reply, great info! it helped me to better understand it;
but of course, some additional questions have risen.

maybe some of those question/comments may seem stupid (ie. clear), but
im new to python and i want to assure myself i get it right; thx for
patience.

> There is an alternative, if the print is a debug print:
> 
> - print a repr() of the unicode object instead of
>   the unicode object itself. This will work on all
>   terminals, and show hex escapes of non-ASCII characters.

just to make sure:

override the object's __repr__(self) method to st. like:

class my_string(string):
    def __repr__(self)
	tmp = unicode(self.attribute1 + " " + self.attribute2)
	return tmp

and use 'my_string' class without any worries instead of classical
string?

> 
> No. unicode(text) uses the system default encoding
> (sys.getdefaultencoding()) which normally is ASCII.
> 
> Printing a Unicode string to a terminal should work fine if the terminal
> is properly configured. What that means depends on your operating
> system.

my system is debian GNU/Linux stable, im using it for a very, very long
time, though i did not changed any terminal settings but the very
basics.  My locales are properly set, im using LC_* environment
variables to set default locale to czech environment with ISO-8859-2
charset.  Terminal is capable of displaying 8bit charsets, im not sure
about unicode charsets -- never tried, never needed.  All other
locale-sensitive programms are satisfied. (ie. java interpretter -- this
should be much like python :)

guess in germany it is quite the same, maybe ISO-8859-1 is preferred

example output from my system:

>>> import locale
>>> loc = locale.getdefaultlocale()
>>> loc
['cs_CZ', 'ISO8859-2']

so i guess this is ok.

but the problem maybe in my 'site.py' where setting encoding
according to my locale is done in a code like this:

if 0:
    # Enable to support locale aware default string encodings.
    import locale
    loc = locale.getdefaultlocale()
    if loc[1]:
        encoding = loc[1]

so i guess it is never done :(

did you yourself changed it? did you think this is the 'portable
solution'? i guess not -- another system, another locale, maybe being in
ascii is the best.

> 
> >
> >	* why is that behaviour? -- if you search google you get
> >thousands of errors like this -- with no proper solutions i must add
> 
> There is a proper solution. Unfortunately, very similar yet different
> problems cause the same error message, and each problem has a different
> proper solution:
> 

well, if a piece of information like you gave to me was contained in
standard python documentation, probably there will be less
misunderstanding about this issue.

> - A Unicode error is raised when trying to combine a Unicode string
>   and a byte string, if the byte string contains non-ASCII characters,
>   e.g.
> 
>    u"Martin v. " + "Löwis"
> 
>   The proper solution is to convert the second string into a Unicode
>   object, e.g. through
> 
>            unicode("Löwis", "iso-8859-1")
> 

if i use 
#! /usr/bin/env python
# -*- coding: UTF-8 -*-
at the begginnig of my every script, the example above still has to 
be converted -- because of the iso-8859-1 you use in "Löwis"?

what would change if i use
#! /usr/bin/env python
# -*- coding: ISO-8859-1 -*-
?

can i ommit the conversion (ie. is it done automatically for me as if
i write
u"Martin v. " + unicode("Löwis", "ISO-8859-1")
)?

> - A unicode error is raised when a Unicode string is printed to
>   a terminal. The proper solution is that the system administrator
>   or the user should properly administer the locale, so that Python
>   knows what characters the terminal can print. For characters that
>   are then still non-printable, repr() is the proper solution.

see above for comments on my setting.  if you have done such a
customization (and it differs from mine) and you have experience with
linux, may i ask you for recommendations?

> 
> - A unicode error is raised when a library does not support Unicode
>   for some reason. The proper solution is to fix the library. A
>   proper work-around is to explicitly convert Unicode strings into
>   the encoding that the library expects.
> 

dont understand -- which library? you meant for example the ogg vorbis
c-library when used with python bindings? -- in that case, what can be
done by me as a developer? -- to know what encoding is used and do the
tricky things i did -- now properly understood:

1. convert from "unknown" to unicode 
tmp = unicode("string", "library-charset-specification")

2. print it like
print tmp.encode("my-terminal-charset-specification")

question: 

library-charset-specification can be ommited if i specify it in a
comment at the very begginning of a script (as i guessed above) -- or
my-terminal-charset-specification can be ommitted if specied in comment 
-- or can i ommit both if equal?

if im about to use the __repr__(self) method, i would do the conversion
inside that method and return tmp, as i tried above, right?

> 
> >	* i was looking in site.py and there is deleted the
> >sys.setdefaultencoding() function, but from the comments i do
> >not know why -- you know it? why is user not allowed to change the
> >default encoding? it seems reasonable to me if he/she could do that.
> 
> Yes, but that would not be a proper solution. It would mean that your
> script now only works on your system, and fails on a system where
> the default encoding has not been changed, or has been changed to
> something else. Users should use a proper solution instead.

i thought that every programmer could call his
sys.setdefaultencoding() method at the start of the script to set it to
whatever he needs. it should work on every system that has proper
encoding files. (though in site.py is a comment on MS indows -- it
breaks that rule:)

> 
> Regards,
> Martin

once again, thank you a lot.

Regards,
Martin (also :)