[Python-3000] PEP 3138- String representation in Python 3000

Thu May 15 19:50:22 CEST 2008

On Fri, May 16, 2008 at 1:49 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15/05/2008, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
>> I would like to call it "improve", not break :)
>
> Please can you help me understand the impact here. I am running
> Windows XP (UK English - console code page 850, which is some variety
> of Latin 1). Currently, printing non-latin1 characters gives me an
> exception: for example,
>
>>>> print("Hello\u03C8")
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "D:\Apps\Python30\lib\io.py", line 1103, in write
>    b = s.encode(self._encoding)
>  File "D:\Apps\Python30\lib\encodings\cp850.py", line 12, in encode
>    return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character '\u03c8' in
> position 5: character maps to <undefined>
>
> (This is 3.0a1 - I don't know if much has changed in more recent
> alphas, if it's significant I can upgrade and try again).
>
> Can you explain what I need to change to make sys.stdout behave as you
> propose? If you can do that, I can test what I will see in your
> proposal if I type print(repr("Hello\u03C8")). My suspicion is that I
> will see unreadable garbage, rather than what I currently get, which
> is backslash-escaped, but readable.

With my proposal, print("Hello\u03C8") prints "Hello\u03C8" instead of
raising an exception. And print(repr("Hello\u03C8")) prints
"'Hello\u03C8'", so no garbage are printed.

Now, let's say you are Greek and working on Greek version of XP.
print("Hello\u03C8") prints "Hello"+collect Greek character(GREEK
SMALL LETTER PSI). And print(repr("Hello\u03C8")) prints
"'Hello"+collect Greek character+"'". If you have Greek font, you can
try this if you swich your command prompt by "chcp 1253"  (change
codepage to 1253) on your command prompt.

>
> The key point here is that I don't think you're proposing to detect
> the user's display capabilities and adapt the output to match, so if
> my display can't cope with the full Unicode character set, I'll have
> to make manual adjustments or see broken output.
>
Python detects user's capabilities, since Python 2.x(or 1.6? I forgot.)
On Windows, Python detects user's encoding from codepage. On Unix,
locale is used to detect encoding.

> Like it or not, a large proportion of Python's users still work in
> environments where much of the Unicode character space is not
> displayed readably.
>

I agree. So rejecting my proposal as "Not common use-case" might be
reasonable. But I should argue to get sympathy, anyway:).

> One point I forgot to clarify is that I'm fully aware that
> print(arbitrary_string) may display garbage, if the string contains
> Unicode that my display can't handle. The key point for me is that
> print(repr(arbitrary_string)) is *guaranteed* to display correctly,
> even on my limited-capability terminal, precisely because it only uses
> ASCII and no matter how dumb, all terminals I know of display ASCII.

I can understand your aware. Perhaps you don't want see your terminal
flash by escape sequence, beep, endless graphic characters, etc. For
legacy byte-string applications(whether written in C or Python),
printing arbitrary string can cause such mess. But this is unlikely to
happen by printing the Unicode string, since the characters your
terminal cannot understand will be escaped or be converted to
character such as '?'.

Hope this helps.