Python's 8-bit cleanness deprecated?
M.-A. Lemburg
mal at lemburg.com
Fri Feb 7 03:47:21 EST 2003
Roman Suzi wrote:
> On Thu, 6 Feb 2003, M.-A. Lemburg wrote:
>
>
>>Roman Suzi wrote:
>>
>>>We discussed PEP-0263
>>>( http://python.org/peps/pep-0263.html )
>>>
>>>* Further discussion is probably not constructive, as Skip noticed.
>>
>>Indeed :-) Even less, since it is already implemented in Python 2.3.
>>
>>
>>>Encoding-cookie is bitter, but probably necessary. I have no other
>>>arguments.
>
>
> Well, if encoding-cookie is here to stay, I have only one wish:
>
> aaa.py:7: DeprecationWarning: Non-ASCII character '\xec', but no declared
> encoding
> """
>
> - please, add some more hint about encoding addition to the source.
> URL of the PEP will do.
Good idea.
> I still do not know what to do with user's of Python programs.
> Do we need to urge them to become Python programmers ;-)
No, but they'll need to pay some lucky Python programmer to
get rid off the warning :-) Seriously, the warning and the trouble
are intended as I already mentioned in the bug report Kirill
filed on SF: http://www.python.org/sf/681960/ :
Python's source code was originally never meant to contain
non-ASCII characters. The PEP implementation now officially
allows this provided that you use an encoding marker, e.g.
"""
# -*- coding: windows-1251 -*-
name = raw_input("Êàê òåáÿ çîâóò ? ")
print "Ïðèâåò %s" % name
"""
(If you open this in emacs, you'll see Russian text)
Note that this is also needed in order to support UTF-16
file formats which use two bytes per character. Python
will automatically detect these files, so if you really don't
like the coding marker, simply write the file using a UTF-16
aware editor which prepends a UTF-16 BOM mark to the
file.
BTW, if you absolutely want to use multiple encodings in a single
file and you're sure what you're doing, then you can "disable"
that warning and possible codec errors by telling Python
to interpret the file as latin-1:
"""
# Tell Python to read this file as-is: coding: latin-1
name = raw_input("Êàê òåáÿ çîâóò ? ")
print "Ïðèâåò %s" % name
"""
Note that Unicode literals then *have* to be in Latin-1,
otherwise, you'll lose big. By telling Python to read the
file using the Latin-1 codec you basically tell it to
work exactly like it does now (which is considered a bug).
This whole thing is one more step in the direction of
explicit is better than implicit and opens up Python
for many more languages such as, for example, Asian
scripts.
> And one more point. The Style Guide need to be upgraded accordingly,
> banning multiple encodings in the source and telling to add
> "coding: " hint the recommended way.
Good point. I'll add comment there.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/
More information about the Python-list
mailing list