Python's 8-bit cleanness deprecated?

Wed Feb 5 18:12:15 EST 2003

Brian Quinlan <brian at sweetapp.com> wrote in 
news:mailman.1044479292.4322.python-list at python.org:

> > anyway if the PEP proposes to seach for a regexp in comments, then it
> can
>> do it equaly well over the rest of the source. meaning that
>> 
>> regexp1 = r"__encoding__[\t ]*=[\t ]*[\"']+(\w+)[\"']+"
>> regexp2 = r"from[\t ]+__encoding__[\t ]+import[\t ]+[\"']+(\w+)[\"']+"
>> 
>> can be searched before parsing the grammar, both are valid python code
>> that do not need any language extensions and it still works after 
>> removing all comments.
> 
> Here are the downsides/observations:
> 1. you can't actually use a regular expression like that because the
> file
>    might be using a multibyte encoding system or an encoding that is not
>    an ASCII superset i.e. searching for that pattern might be hard

but the PEP uses a regexp to describe the "# -*- conding" thing. it has the 
same limitations, however it is implemented.

> 2. no editors will understand the encoding meta-information that you are
>    try to provide 

yes. but the "# -*- coding.." line will not be understand by the gazilion 
of editors out there, only by two, vi and emacs. so that is not a strong 
argument. or are all python programmers supposed to use one of these 
editors?!?

>    (I don't get why people don't seem to understand the
>    meta-informational aspect of encodings; the encoding isn't a property
>    of your script, it is like the size or permissions of the source file
>    i.e. in an ideal world, Python shouldn't have to care because someone
>    else would worry about it).

right, would be nice if that was handled by the filesystem. but 
unfortunately it's harder to change that...

> 3. it has runtime effects which are not necessarily desirable

i'd call emmiting a warning as runtime effect too ;-)
the warning is intended for the eyes of a developer (and maybe in some 
cases for a customer, but i doubt that it's of great significance) instead 
a lot of only-users are getting warnings for software that they are using 
for a long time, web logs are filled etc.

>> actualy i like the regexp1 way. you can even retreive the encoding at
>> runtime and if encoding matters during parsing and executing it also
>> matters during runtime, otherwise we would not need the PEP, right?
> 
> The encoding only matters at load time. You should be able to save your
> source files using a different encoding, change the encoding declaration
> (unless Emacs/VIM does it for you automatically) and run your script
> without any change in behavior.

ok, if i put a "ä"(latin1) in my script it will be printed as an other 
character in the DOS box, with or without the encoding line (as it is now).
so the entire encoding line did not improve anything, but caused a lot of 
work for me, changing all old files to get rid of the warning...

i'm +1 for way to specify the encoding, i just don't like it when my (old) 
programs write out a warning at the clients PC. 

the PEP says:
"""A warning will be issued if non-ASCII bytes are found in the
       input, once per improperly encoded input file."""

so i'll get warnings because of non-english comments, maybe many warnings 
for one run of a big program. no, i don't like that.

>> a comment is a comment and should stay a comment...
> 
> Unless it is a shebang line?

which is a completly different thing as its NOT at all interpreted by 
python. it's read by your OS/Shell that want's to execute the file.

chris

-- 
Chris <cliechti at gmx.net>