[issue9598] untabify.py fails on files that contain non-ascii characters
Alexander Belopolsky
report at bugs.python.org
Sat Sep 4 00:47:06 CEST 2010
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
> If untabify fails because a file has an incorrect encoding, is it really
> a problem in untabify? This is a developer’s tool, so getting a
> traceback here seems okay to me.
I disagree. I think we should use this opportunity to clarify preferred encoding for C language source files in python and make untabify produce meaningful diagnostic in case of encoding errors.
As a matter of policy, I see two possibilities:
1. Restrict C sources to 7-bit ASCII. (A pedantic reading of ANSI C standard would probably suggest even more restricted character set, but practically, I don't think 7-bit ASCII in C comments is likely to cause problems for any tools.
2. Require UTF-8 encoding for non-ASCII characters. Given that this is the default for python source code, it is likely that tools that are used for python development can handle UTF-8.
My vote is for #1. Display of non-ascii characters is still not universally supported and they are likely to be clobbered when diffs are copied in e-mails etc.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9598>
_______________________________________
More information about the Python-bugs-list
mailing list