[Tutor] Newbie self-introduction + little encoding problem

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Mon Dec 29 16:58:32 EST 2003


> > [31, 'Janvier', 28, 'F\xe9vrier', 31, 'Mars', 30, 'Avril', 31, 'Mai', 30,
> > 'Juin', 31, 'Juillet', 31, 'Ao\xfbt', 30, 'Septembre', 31, 'Octobre', 30,
> > 'Novembre', 31, 'D\xe9cembre']
>
> (that's a repost; it seems the first e-mail got lost).
>
> But that's right.  Did you try to print such a string?
>
> > As you can see, the accents go completely wrong...
>
> They don't.  You simple see the intern representation of the strings.


Hi Twilly,

Yes, what we are seeing when we print a list is the 'repr()' of each
element in our list.

    http://www.python.org/doc/lib/built-in-funcs.html#l2h-59

repr() is another string-converting function that's similar to str()  ---
but it's subtly different because it shows exactly what we'd need to type
at the interpreter to get that value.


Hmmm... that was a little difficult to scan.  Maybe an example will help:

###
>>> s = 'this is a test'
>>> print str(s)
this is a test
>>> print repr(s)
'this is a test'
###

Here, we can see that str() and repr() do give back subtly different
things, because strings need quotes around them.  And for your amusement:
doing a repeated repr() on a string adds more and more quotes:

###
>>> print s
this is a test
>>> print repr(s)
'this is a test'
>>> print repr(repr(s))
"'this is a test'"
>>> print repr(repr(repr(s)))
'"\'this is a test\'"'
>>>>>> print '"\'this is a test\'"'
"'this is a test'"
###


Anyway, accents on some systems don't print the way you might expect them
to, but repr() shows them all in all their hexidecimal glory.  The
hexadecimal characters print the same way, regardless of our current
encoding scheme, so that's why we're seeing things like:

> > [31, 'Janvier', 28, 'F\xe9vrier', 31, 'Mars', 30, 'Avril', 31, 'Mai', 30,
> > 'Juin', 31, 'Juillet', 31, 'Ao\xfbt', 30, 'Septembre', 31, 'Octobre', 30,
> > 'Novembre', 31, 'D\xe9cembre']



Normally, when we say something like:

    print foo

we're asking Python to first call str() to convert 'foo' into a nice,
human-readable string, and then Python prints that.

It turns out, though, that str()ing a list will repr() every element in
that list as it constructs the string representation.  And that's where
your accents are being shown as hexadecimal constants.  If we want to
change that behavior, we need to be more explicit by transforming our list
into a string:

###
>>> def list_to_string(mylist):
...     stringed_elements = []
...     for x in mylist:
...         stringed_elements.append(str(x))
...     return '[' + ', '.join(stringed_elements) + ']'
...
>>>
>>> entries = [31, 'Janvier', 28, 'F\xe9vrier', 31, 'Mars']
>>> print entries
[31, 'Janvier', 28, 'F\xe9vrier', 31, 'Mars']
>>> print list_to_string(entries)
[31, Janvier, 28, F?vrier, 31, Mars]
###


Note that I'm getting question marks on my system, because my system's
native encoding is utf-8, and not iso-8859-1.  But on your system,
list_to_string() should show the accents that you expect.


Hope this helps!




More information about the Tutor mailing list