Python's 8-bit cleanness deprecated?

Fri Feb 7 03:47:21 EST 2003

Roman Suzi wrote:
> On Thu, 6 Feb 2003, M.-A. Lemburg wrote:
> 
> 
>>Roman Suzi wrote:
>>
>>>We discussed PEP-0263
>>>( http://python.org/peps/pep-0263.html )
>>>
>>>* Further discussion is probably not constructive, as Skip noticed.
>>
>>Indeed :-) Even less, since it is already implemented in Python 2.3.
>>
>>
>>>Encoding-cookie is bitter, but probably necessary. I have no other
>>>arguments. 
> 
> 
> Well, if encoding-cookie is here to stay, I have only one wish:
> 
> aaa.py:7: DeprecationWarning: Non-ASCII character '\xec', but no declared 
> encoding
>   """
> 
> - please, add some more hint about encoding addition to the source. 
> URL of the PEP will do. 

Good idea.

> I still do not know what to do with user's of Python programs. 
> Do we need to urge them to become Python programmers ;-)

No, but they'll need to pay some lucky Python programmer to
get rid off the warning :-) Seriously, the warning and the trouble
are intended as I already mentioned in the bug report Kirill
filed on SF: http://www.python.org/sf/681960/ :

Python's source code was originally never meant to contain
non-ASCII characters. The PEP implementation now officially
allows this provided that you use an encoding marker, e.g.

"""
# -*- coding: windows-1251 -*-
name = raw_input("Êàê òåáÿ çîâóò ? ")
print "Ïðèâåò %s" % name
"""
(If you open this in emacs, you'll see Russian text)

Note that this is also needed in order to support UTF-16
file formats which use two bytes per character. Python
will automatically detect these files, so if you really don't
like the coding marker, simply write the file using a UTF-16
aware editor which prepends a UTF-16 BOM mark to the
file.

BTW, if you absolutely want to use multiple encodings in a single
file and you're sure what you're doing, then you can "disable"
that warning and possible codec errors by telling Python
to interpret the file as latin-1:

"""
# Tell Python to read this file as-is: coding: latin-1
name = raw_input("Êàê òåáÿ çîâóò ? ")
print "Ïðèâåò %s" % name
"""

Note that Unicode literals then *have* to be in Latin-1,
otherwise, you'll lose big. By telling Python to read the
file using the Latin-1 codec you basically tell it to
work exactly like it does now (which is considered a bug).

This whole thing is one more step in the direction of
explicit is better than implicit and opens up Python
for many more languages such as, for example, Asian
scripts.

> And one more point. The Style Guide need to be upgraded accordingly,
> banning multiple encodings in the source and telling to add
> "coding: " hint the recommended way.

Good point. I'll add comment there.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/