Proposal: require 7-bit source str's

"Martin v. Löwis" martin at v.loewis.de
Fri Aug 6 08:12:14 EDT 2004


Hallvard B Furuseth wrote:
>>"Written by Martin v. Löwis"
> 
> 
> So if the file has -*- coding: iso-8859-1 -*-, how does that doc string
> look to someone using a iso-8859-2 locale?

Let's start all over. I'm referring to a time when there was no encoding
declaration, and PEP 263 was not written yet. At that time, I thought
that a proper encoding declaration (i.e. a statement) would be the
best thing to do. So in my example, there is no -*- coding: iso-8859-1 
-*- in the file. Instead, there is a directive.

About the unrelated question: How should a docstring be displayed
to a user working in a different locale? Well, in theory, the docstring
should be converted from its source encoding to the encoding where
it is displayed. In practice, this is difficult to implement, and
requires access to the original source code. However, Francois Pinard
has suggested to add an __encoding__ attribute to each module,
which could be used to recode the docstring.

About your literal question: In the current implementation, the string
looks just fine, as this docstring is codepoint-by-codepoint identical
in iso-8859-1 and iso-8859-2.

> Just like a str7bit directive, in whatever form, would not catch the
> missing u in front of the doc string.

Not necessarily. It would be possible to go back and find all strings
that fail to meet the requirement.

Notice that your approach only works for languages with single-byte
character sets anyway. Many multi-byte character sets use only
bytes < 128, and still they should get the warning you want to produce.

>>(of course, requiring that people use escape sequences for
>>them might be acceptable).
> 
> 
> Argh!  Please, no.

Think again. There absolutely is a need to represent byte arrays
in Python source code, e.g. for libraries that manipulate binary
data, e.g. generate MPEG files and so on. They do have a legitimate
need to represent arbitrary bytes in source code, with no intention
of these bytes being interpreted as characters.

Regards,
Martin



More information about the Python-list mailing list