Python's 8-bit cleanness deprecated?

Fri Feb 7 14:36:41 EST 2003

Kirill Simonov wrote:

>* M.-A. Lemburg <mal at lemburg.com>:
>  
>
>>No, but they'll need to pay some lucky Python programmer to
>>get rid off the warning :-) Seriously, the warning and the trouble
>>are intended as I already mentioned in the bug report Kirill
>>filed on SF: http://www.python.org/sf/681960/ :
>>    
>>
...

>And I can propose a perfect solution. If there are no defined encoding
>for a source file, assume that it uses a simple 8-bit encoding. Do not
>convert the file into UTF-8 in the tokenizer. And do not convert string
>literals in the compiler. Raise SyntaxError if a non-ASCII character is
>contained in a Unicode literal. We will even save a few CPU cycles
>for most Python source files using this approach.
>
>I will write a patch if you agree with this solution.
>
...

Of course, it means nothing for me to agree, (I don't have a Python-dev 
vote)... but this approach (assuming it's workable) does sound more 
reasonable than breaking every old module that uses > 128 characters in 
regular string literals.  Sure, I'd love to be paid big bucks to update 
old, unmaintained Python modules, but I'm guessing the headache and cost 
of having to do that would, by souring users on Python as being 
unstable, have a net-negative effect on total Python jobs in the end. 
 As a devil's advocate, however, doesn't it make the conversion of the 
file more complex?  I'm guessing the python-dev people are doing 
something like "codec.convert(file)", whereas they will need to convert 
solely unicode strings with the new approach.

BTW, am I the only one who has visions of eventually being passed a 
module written in a Chinese or Japanese codec and being unable to even 
see what Chinese/Japanese characters are used (lack of fonts for text 
editors), so just facing a field of nulls something like:

???? ????????.??  ?????? *
???? ???? ?????? ??????

????? ?????????:
    """???? ????? ??? ????????? (?????-???????) ??????? ??? ????????

    ??? ????????? ????? ???????? ?????? ???????? ??? ????????
    ? ?????????? ?? ????????, ???????????? ????? ???????? ??????????
    """
    ????????? = 1
    ??? __????__(????):
        """?????????? ??? ????????????????'? ???????? ??????????
        """
        ????.__???????? = []
    ??? ?????????????? ( ???? ):
        """??? ??? ???? ?? ???-???????? ??? ??? ????????? ???????

        ???? ???? ?? ??? ???? ?? ?????????? ???-????????
        ????????? ?? ??? ????????? ???????.
        """
        ?????? ????.__????????[:]

which might make for a fun game, at least, I suppose ;) , but would be 
seriously freaky to work with.  Similar dreams for UTF-16-encoded files, 
lots-and-lots of NULLs in the older editors.

Just my $0.03 CDN,
Mike

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/