[I18n-sig] Strawman Proposal (2): Encoding attributes

Paul Prescod paulp@ActiveState.com
Sat, 10 Feb 2001 07:37:04 -0800


"Martin v. Loewis" wrote:
> 
> ...
> 
> My understanding is that only EUC-JP is an ASCII superset (*)
> (i.e. all bytes representing JIS characters are >127); in Shift-JIS,
> the encoding of a character is two bytes, of which only the first byte
> is always >128. Since Shift-JIS is quite common, it should be
> supported as a file encoding.

I don't think it is reasonable in the short term to support characte
sets that cannot be lexed with the current Python lexer. I think we
should design with Shift-JIS in mind for the future but for now I think
we should limit our list of supported encodings to those that don't
require large Python parser changes.

 Paul Prescod