possible unicode bug in implicit string concatenation?

Sat Sep 11 02:04:51 EDT 2004

Hi team! While troubleshooting a crash I had while using BitTorrent
where the torrent's target file names didn't fall into the ascii range
I was playing around in the interpreter and noticed this behaviour:

>>> u'\u12345' + 'foo'
u'\u12345foo'
>>> u'\u12345' u'foo'
u'\u12345foo'
>>> u'\u12345' + u'foo'.encode('ascii')
u'\u12345foo'
>>> u'\u12345' u'foo'.encode('ascii')
Traceback (most recent call last):
  File "<interactive input>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in
position 0: ordinal not in range(128)
>>> 

Is this a bug, or is my understanding of how Python works flawed? I
tried tracing it within the interpreter itself bug got lost after a
little while... I'm familiar with the interpreter loop, but not the
parser, and I suspect this is something to do with implicit string
concatenation being parsed differently from the explicit version, i.e.
the explicit version uses the + operator slot, while the implicit
version does something else. Any ideas?

Fahd Khan
ICON | Clinical Research
W: 281-295-4834