[Python-Dev] Unicode: When Things Get Hairy
Moshe Zadka
Moshe Zadka <mzadka@geocities.com>
Sat, 11 Mar 2000 13:05:48 +0200 (IST)
On Sat, 11 Mar 2000, M.-A. Lemburg wrote:
> Hmm, this must have been introduced by your contains code...
> it did work before.
Nope: the string "in" semantics were forever special-cased. Guido beat me
soundly for trying to change the semantics...
> The normal action taken by the Unicode and the string
> code in these mixed type situations is to first
> convert everything to Unicode and then retry the operation.
> Strings are interpreted as UTF-8 during this conversion.
Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
Should it? (Again, it didn't before). If it does, then the order of
testing for seq_contains and seq_getitem and conversions
> Perhaps I should also add a tp_contains slot to the
> Unicode object which then uses the above API as well.
But that wouldn't help at all for
u"a" in "abbbb"
PySequence_Contains only dispatches on the container argument :-(
(BTW: I discovered it while contemplating adding a seq_contains (not
tp_contains) to unicode objects to optimize the searching for a bit.)
PS:
MAL: thanks for the a great birthday present! I'm enjoying the unicode
patch a lot.
--
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html