PEP: Defining Python Source Code Encodings

Roman Suzi rnd at onego.ru
Wed Jul 18 12:42:48 EDT 2001


On Wed, 18 Jul 2001, M.-A. Lemburg wrote:

>Roman Suzi wrote:
>> On Tue, 17 Jul 2001, M.-A. Lemburg wrote:
>>
>> Nope. There must be no encode-decode back. Or it will slow down
>> starting Python _scripts_ unnecessary.
>>
>> That is why I suggested "unknown" encoding - a safe default
>> for those who do not want any back-and-force recodings.
>
>If you want to avoid having to decode and the reencode data
>in ther parser, we would have to live with two sets of parsers
>in Python -- one for Unicode and one for 8-bit data.

No. I think there could be bypass, which will not affect the parser
logic too much.

>I don't think that anyone would like to maintain those two
>sets, so it's basically either go all the way or not move
>at all.
>
>> There clearly must be the way to prevent encode-decode. And it would be
>> better if only EXPLICITLY given encoding will trigger encode-decode
>> mechanism.
>
>That's not true: Python caches byte-code compiled versions of
>scripts in .pyc|o files. So the performance problem is really not
>all that important.

I've just made a simple program called myf.py and made .pyo and .pyc
then run strace -f to see which files it opens (tries to open).
It doesn't touch .pyc or .pyo for sure:

2229  open("/etc/ld.so.preload", O_RDONLY)
2229  open("/etc/ld.so.cache", O_RDONLY)
2229  open("/lib/libdl.so.2", O_RDONLY)
2229  open("/lib/libpthread.so.0", O_RDONLY)
2229  open("/lib/libm.so.6", O_RDONLY)
2229  open("/lib/libc.so.6", O_RDONLY)
2229  open("myf.py", O_RDONLY)
2229  open("/usr/lib/python1.5/exceptions.so", O_RDONLY)
2229  open("/usr/lib/python1.5/exceptionsmodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/exceptions.py", O_RDONLY)
2229  open("/usr/lib/python1.5/exceptions.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/site.so", O_RDONLY)
2229  open("/usr/lib/python1.5/sitemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site.py", O_RDONLY)
2229  open("/usr/lib/python1.5/site.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/os.so", O_RDONLY)
2229  open("/usr/lib/python1.5/osmodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/os.py", O_RDONLY)
2229  open("/usr/lib/python1.5/os.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/posixpath.so", O_RDONLY)
2229  open("/usr/lib/python1.5/posixpathmodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/posixpath.py", O_RDONLY)
2229  open("/usr/lib/python1.5/posixpath.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/stat.so", O_RDONLY)
2229  open("/usr/lib/python1.5/statmodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/stat.py", O_RDONLY)
2229  open("/usr/lib/python1.5/stat.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/UserDict.so", O_RDONLY)
2229  open("/usr/lib/python1.5/UserDictmodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/UserDict.py", O_RDONLY)
2229  open("/usr/lib/python1.5/UserDict.pyc", O_RDONLY)
2229  open("/dev/null", O_RDONLY|O_NONBLOCK|O_DIRECTORY)
2229  open("/usr/lib/python1.5/site-packages", O_RDONLY|O_NONBLOCK|O_DIRECTORY)
2229  open("/usr/lib/python1.5/site-packages/NumPy.pth", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/PIL.pth", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/RNG.pth", O_RDONLY)
2229  open("/usr/lib/python1.5/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/plat-linux-i386/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/plat-linux-i386/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/plat-linux-i386/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/plat-linux-i386/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-tk/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-tk/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-tk/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-tk/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-dynload/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-dynload/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-dynload/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/lib-dynload/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/NumPy/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/NumPy/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/NumPy/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/NumPy/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/PIL/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/PIL/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/PIL/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/PIL/sitecustomize.pyc", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/RNG/sitecustomize.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/RNG/sitecustomizemodule.so", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/RNG/sitecustomize.py", O_RDONLY)
2229  open("/usr/lib/python1.5/site-packages/RNG/sitecustomize.pyc", O_RDONLY)

So, I argue that a script myf.py is not cached in compiled for
anywhere and is recompiled each time!

>> ...and efficiency reasons too. re was slowed down significantly by adding
>> Unicode support.
>
>I seriously doubt that. Fredrik (who wrote the sre engine) is an
>optimization genius and in some cases even made the sre engine
>faster than the string module implementations of e.g. find().

Well, u2 is no different from ASCII+latin1 for modern computers
as they have 32 bit words...

>> > > >    To make this backwards compatible, the implementation would have to
>> > > >    assume Latin-1 as the original file encoding if not given (otherwise,
>> > > >    binary data currently stored in 8-bit strings wouldn't make the
>> > > >    roundtrip).
>> > > ...as I said, there must be no assumed charset. Things must
>> > > be left as is now when no explicit encoding given.
>> > This is what the Latin-1 encoding assures.
>> I still think something like "raw" is needed...
>Latin-1 gives you this "raw" feature.

I still can't get Netscape to print in cyrillic because it assumes that
it's buttons and fonts are latin1! Sometimes assumptions are going to far
to be allowed in the first place!

>> > We've been discussing these on python-dev, but Guido is not
>> > too keen on having them.
>>
>> And this is right. I even think encoding information could be EXTERNAL.
>
>No -- how are editors supposed to know about these external
>files ?

OK. But how do they know about encoding of the 8-bit documents?
Documents have tags to show encoding. Then Python program must
become a document with all those tags here and there.

How do other languages solve this "problem"?

>> > > > Comments are welcome !
>> > Thanks for your comments,
>> I just hope the realisation of your PEP will not make Python scripts
>> running slower ;-) while allowing truly useful i18n functionality.
>
>By the time the PEP will be implemented, CPUs will run at least 50%
>faster than they do now -- this should answer your question ;-)

Nope. This is extensive way of bloated programs (like W2K) hoping to
increase Intel sells of CPUs.

If computers were always so speedy as now, we all were using bubble sort.

Sincerely yours, Roman Suzi
-- 
_/ Russia _/ Karelia _/ Petrozavodsk _/ rnd at onego.ru _/
_/ Wednesday, July 18, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Hard work never killed anyone, but why chance it?" _/





More information about the Python-list mailing list