s.startswith(unicode) bug???

Thu Oct 3 01:29:18 EDT 2002

Erik Westra wrote:

> I'm running Python 2.2 (not 2.2.1) under Windows 2K, and have come
> across the following bug.  I was wondering if someone with 2.2.1 could
> please check if it still happens -- if so, I'll post a bug report to
> the Python SourceForce site.
> 
> Under Python 2.2, if I evaluate:
> 
>     "a".startswith("a")
> 
> it correctly returns 1.  But if I change the substring to unicode,
> like this:
> 
>     "a".startswith(u"a")
> 
> it returns 0!  If I then try:
> 
>     "ab".startswith(u"a")
> 
> it again returns 1.  Most weird!  This only happens with the substring
> is in unicode rather than a normal string...

I don't find it too surprising, since Unicode string strings and raw
ASCII strings are different things.  I presume the .startswith method
goes by raw data, and so you're going to get some (apparently) curious
results if you try mixing Unicode and ASCII strings arbitrarily.  To get
the desired effect, merely convert the primary string to Unicode first:

>>> unicode("ab").startswith(u"a")
1

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/  \ The meaning of life is that it stops.
\__/ Franz Kafka
    Fauxident / http://www.alcyone.com/pyos/fauxident/
 A "faux" ident daemon in Python