Multibyte Character Surport for Python

Chris Liechti cliechti at gmx.net
Wed May 8 19:45:33 EDT 2002


martin at v.loewis.de (Martin v. Loewis) wrote in 
news:m3vg9yjthd.fsf at mira.informatik.hu-berlin.de:

> PEP 263 will introduce the notion of source encodings - without this,
> it wouldn't even be possible to parse the source code, anymore. The
> PEP, over months, had a question in it asking whether non-ASCII
> identifiers should be allowed (the follow-up question would then be:
> which ones?), and nobody ever spoke up requesting such a feature.

i wouldn't allow non ASCII chars. not because i don't like them - i write 
german so i need äöü - but think of someone in a foreign country who just 
does not have those keys on his keyboard. how is he supposed to enter a 
variable with such characters?
or better use chinese symbols - i don't know what they mean, not 
even speaking of how to pronounce them. should i enter variable names as 
pictures, taking my digicam because i can't paint that good by hand?

also note Alex's comment about the natural language. how many languages 
must a programmer learn to work on sources if english isn't sufficient?

of course that restriction on characters doesn't need to be for strings and 
comments. (some comments aren't readable anyway even if you know the 
language where the words are taken from ;-)

(the PEP resticts to identifiers to ASCII only - good)

and how many encodings will be allowed? need i have to a zillion code pages 
on my machine to run modules i find on the net? ok, much from the unicode 
stuff can be reused, but what for smaller targets, startup time etc.

regarding the PEP263.
- i think i don't like "coding" it's not the obvious name for me.
  i'm more used to "encoding" like used with HTML and MIME.

- why use ASCII as default encoding in the future and not UTF-8 (or Latin-
1)? ASCII is a subset of UTF8 and it would allow the rest of the world to 
leave the default when using a unicode aware editor. i think it will become 
very nasty if you must write the correct encoding in each source file...
or is it by intention that smallest available encoding of all is taken to 
enforce more typing?

but basicaly i think the PEP is a good idea.

chris

-- 
Chris <cliechti at gmx.net>




More information about the Python-list mailing list