[Python-Dev] Unicode: When Things Get Hairy

M.-A. Lemburg mal@lemburg.com
Sat, 11 Mar 2000 11:24:26 +0100


Moshe Zadka wrote:
> 
> The following "problem" is easy to fix. However, what I wanted to know is
> if people (Skip and Guido most importantly) think it is a problem:
> 
> >>> "a" in u"bbba"
> 1
> >>> u"a" in "bbba"
> Traceback (innermost last):
>   File "<stdin>", line 1, in ?
> TypeError: string member test needs char left operand
> 
> Suggested fix: in stringobject.c, explicitly allow a unicode char left
> operand.

Hmm, this must have been introduced by your contains code...
it did work before.

The normal action taken by the Unicode and the string
code in these mixed type situations is to first
convert everything to Unicode and then retry the operation.
Strings are interpreted as UTF-8 during this conversion.

To simplify this task, I added method APIs to the
Unicode object which do the conversion for you (they
apply all the necessariy coercion business to all arguments).
I guess adding another PyUnicode_Contains() wouldn't hurt :-)

Perhaps I should also add a tp_contains slot to the
Unicode object which then uses the above API as well.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/