Why asci-only symbols?

Tue Oct 18 14:43:47 EDT 2005

Bengt Richter wrote:
>>Others reject it because of semantic difficulties: how would such
>>strings behave under concatenation, if the encodings are different?
> 
> I mentioned that in parts you snipped (2nd half here):
> 
> This could also support s1+s2 to mean generate a concatenated string
> that has the same encoding attribute if s1.encoding==s2.encoding and otherwise promotes
> each to the platform standard unicode encoding and concatenates those if they
> are different (and records the unicode encoding chosen in the result's encoding
> attribute).

It remains semantically difficult. There are other alternatives, e.g.
(s1+s2).encoding could become None, instead of using your procedure.

Also, this specification is incomplete: what if either s1.encoding
or s2.encoding is None?

Then, what recoding to the platform encoding fails? With ASCII
being the default encoding at the moment, it is very likely that
concatenations will fail if there are funny characters in either
string.

If you propose that this should raise an exception, it means that
normal string concatenations will then give you exceptions just
as often as (or even more often than) you get UnicodeErrors
currently. I doubt users would like that.

>>>This is not a fully developed idea, and there has been discussion on the topic before
>>>(even between us ;-) but I thought another round might bring out your current thinking
>>>on it ;-)
>>
>>My thinking still is the same. It cannot really work, and it wouldn't do 
>>any good with what little it could do. Just use Unicode strings.
>>
> 
> To hear "It cannot really work" causes me agitation, even if I know it's not worth
> the effort to pursue it ;-)

It is certainly implementable, yes. But it will then break a lot of 
existing code.

> Though I grant you
> 
>     #-*- coding: latin1 -*-
>     name = u'Martin Löwis' 
>     print name
> 
> is not that hard to do.

This is indeed what you should do. In Python 3, you can omit the u,
as the string type will go away (and be replaced with the Unicode type).

> (Please excuse the use of your name, which has a handy non-ascii letter ;-)

No problem with that :-)

Regards,
Martin