Python's 8-bit cleanness deprecated?

Brian Quinlan brian at sweetapp.com
Wed Feb 5 16:10:30 EST 2003


 > anyway if the PEP proposes to seach for a regexp in comments, then it
can
> do it equaly well over the rest of the source. meaning that
> 
> regexp1 = r"__encoding__[\t ]*=[\t ]*[\"']+(\w+)[\"']+"
> regexp2 = r"from[\t ]+__encoding__[\t ]+import[\t ]+[\"']+(\w+)[\"']+"
> 
> can be searched before parsing the grammar, both are valid python code
> that do not need any language extensions and it still works after 
> removing all comments.

Here are the downsides/observations:
1. you can't actually use a regular expression like that because the
file
   might be using a multibyte encoding system or an encoding that is not
   an ASCII superset i.e. searching for that pattern might be hard
2. no editors will understand the encoding meta-information that you are
   try to provide (I don't get why people don't seem to understand the
   meta-informational aspect of encodings; the encoding isn't a property

   of your script, it is like the size or permissions of the source file
   i.e. in an ideal world, Python shouldn't have to care because someone
   else would worry about it).
3. it has runtime effects which are not necessarily desirable

> actualy i like the regexp1 way. you can even retreive the encoding at
> runtime and if encoding matters during parsing and executing it also
> matters during runtime, otherwise we would not need the PEP, right?

The encoding only matters at load time. You should be able to save your
source files using a different encoding, change the encoding declaration
(unless Emacs/VIM does it for you automatically) and run your script
without any change in behavior.

> a comment is a comment and should stay a comment...

Unless it is a shebang line?

Cheers,
Brian






More information about the Python-list mailing list