[Python-Dev] Unicode: When Things Get Hairy

Guido van Rossum guido@python.org
Sat, 11 Mar 2000 07:16:06 -0500


[Moshe discovers that u"a" in "bbba" raises TypeError]

[Marc-Andre]
> > Hmm, this must have been introduced by your contains code...
> > it did work before.
> 
> Nope: the string "in" semantics were forever special-cased. Guido beat me
> soundly for trying to change the semantics...

But I believe that Marc-Andre added a special case for Unicode in
PySequence_Contains.  I looked for evidence, but the last snapshot that
I actually saved and built before Moshe's code was checked in is from
2/18 and it isn't in there.  Yet I believe Marc-Andre.  The special
case needs to be added back to string_contains in stringobject.c.

> > The normal action taken by the Unicode and the string
> > code in these mixed type situations is to first
> > convert everything to Unicode and then retry the operation.
> > Strings are interpreted as UTF-8 during this conversion.
> 
> Hmmm....PySeqeunce_Contains doesn't do any conversion of the arguments.
> Should it? (Again, it didn't before). If it does, then the order of
> testing for seq_contains and seq_getitem and conversions 

Or it could be done this way.

> > Perhaps I should also add a tp_contains slot to the
> > Unicode object which then uses the above API as well.

Yes.

> But that wouldn't help at all for 
> 
> u"a" in "abbbb"

It could if PySeqeunce_Contains would first look for a string and a
unicode argument (in either order) and in that case convert the string
to unicode.

> PySequence_Contains only dispatches on the container argument :-(
> 
> (BTW: I discovered it while contemplating adding a seq_contains (not
> tp_contains) to unicode objects to optimize the searching for a bit.)

You may beat Marc-Andre to it, but I'll have to let him look at the
code anyway -- I'm not sufficiently familiar with the Unicode stuff
myself yet.

BTW, I added a tag "pre-unicode" to the CVS tree to the revisions
before the Unicode changes were made.

--Guido van Rossum (home page: http://www.python.org/~guido/)