Yet another unicode WTF

Paul Boddie paul at boddie.org.uk
Fri Jun 5 07:06:50 EDT 2009


On 5 Jun, 11:51, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
>
> Actually strings in Python 2.4 or later have the ‘encode’ method, with
> no need for importing extra modules:
>
> =====
> $ python -c 'import sys; sys.stdout.write(u"\u03bb\n".encode("utf-8"))'
> λ
>
> $ python -c 'import sys; sys.stdout.write(u"\u03bb\n".encode("utf-8"))' > foo ; cat foo
> λ
> =====

Those are Unicode objects, not traditional Python strings. Although
strings do have decode and encode methods, even in Python 2.3, the
former is shorthand for the construction of a Unicode object using the
stated encoding whereas the latter seems to rely on the error-prone
automatic encoding detection in order to create a Unicode object and
then encode the result - in effect, recoding the string.

As I noted, if one wants to remain sane and not think about encoding
everything everywhere, creating a stream using a codecs module
function or class will permit the construction of something which
deals with Unicode objects satisfactorily.

Paul



More information about the Python-list mailing list