[Python-Dev] Unicode: When Things Get Hairy

Moshe Zadka Moshe Zadka <mzadka@geocities.com>
Sat, 11 Mar 2000 13:05:48 +0200 (IST)


On Sat, 11 Mar 2000, M.-A. Lemburg wrote:

> Hmm, this must have been introduced by your contains code...
> it did work before.

Nope: the string "in" semantics were forever special-cased. Guido beat me
soundly for trying to change the semantics...

> The normal action taken by the Unicode and the string
> code in these mixed type situations is to first
> convert everything to Unicode and then retry the operation.
> Strings are interpreted as UTF-8 during this conversion.

Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
Should it? (Again, it didn't before). If it does, then the order of
testing for seq_contains and seq_getitem and conversions 

> Perhaps I should also add a tp_contains slot to the
> Unicode object which then uses the above API as well.

But that wouldn't help at all for 

u"a" in "abbbb"

PySequence_Contains only dispatches on the container argument :-(

(BTW: I discovered it while contemplating adding a seq_contains (not
tp_contains) to unicode objects to optimize the searching for a bit.)

PS:
MAL: thanks for the a great birthday present! I'm enjoying the unicode
patch a lot.
--
Moshe Zadka <mzadka@geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html