[Python-Dev] Re: [Bug #121013] Bug in <stringobject>.join(<unicodestring>)
Michael Hudson
mwh21@cam.ac.uk
28 Nov 2000 08:52:00 +0000
"M.-A. Lemburg" <mal@lemburg.com> writes:
> Michael Hudson wrote:
> >
> > "M.-A. Lemburg" <mal@lemburg.com> writes:
> >
> > > > Date: 2000-Nov-27 10:12
> > > > By: mwh
> > > >
> > > > Comment:
> > > > I hope you're all suitably embarrassed - please see patch #102548 for the trivial fix...
> > >
> > > Hehe, that was indeed a trivial patch. What was that about trees
> > > in a forest...
> >
> > The way I found it was perhaps instructive. I was looking at the
> > function, and thought "that's a bit complicated" so I rewrote it (My
> > rewrite also seems to be bit quicker so I'll upload it as soon as make
> > test has finished[*]). In the course of rewriting it, I saw the line
> > my patch touched and went "duh!".
>
> Yeah. The bug must have sneaked in there when the function was
> updated to the PySequence_Fast_* implementation.
>
> BTW, could you also add a patch for the test_string.py and
> test_unicode.py tests ?
Here's an effort, but it's a memory scribbling bug. If I change
test_unicode.py thus:
Index: test_unicode.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_unicode.py,v
retrieving revision 1.22
diff -c -r1.22 test_unicode.py
*** test_unicode.py 2000/10/23 17:22:08 1.22
--- test_unicode.py 2000/11/28 08:49:11
***************
*** 70,76 ****
# join now works with any sequence type
class Sequence:
! def __init__(self): self.seq = 'wxyz'
def __len__(self): return len(self.seq)
def __getitem__(self, i): return self.seq[i]
--- 70,76 ----
# join now works with any sequence type
class Sequence:
! def __init__(self): self.seq = [u'w',u'x',u'y',u'z']
def __len__(self): return len(self.seq)
def __getitem__(self, i): return self.seq[i]
***************
*** 78,83 ****
--- 78,87 ----
test('join', u'', u'abcd', (u'a', u'b', u'c', u'd'))
test('join', u' ', u'w x y z', Sequence())
test('join', u' ', TypeError, 7)
+ test('join', ' ', u'a b c d', [u'a', u'b', u'c', u'd'])
+ test('join', '', u'abcd', (u'a', u'b', u'c', u'd'))
+ test('join', ' ', u'w x y z', Sequence())
+ test('join', ' ', TypeError, 7)
class BadSeq(Sequence):
def __init__(self): self.seq = [7, u'hello', 123L]
and back out the fix for the join bug, this happens:
...
...
Testing Unicode formatting strings... done.
Testing builtin codecs...
Traceback (most recent call last):
File "test_unicode.py", line 378, in ?
assert unicode('hello','utf8') == u'hello'
File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 30, in
?
import codecs,aliases
SystemError: compile.c:185: bad argument to internal function
Segmentation fault
i.e. it crashes miles away from the problem.
I'll reply to the other stuff later - no time now.
Cheers,
M.
--
The only problem with Microsoft is they just have no taste.
-- Steve Jobs, (From _Triumph of the Nerds_ PBS special)
and quoted by Aahz Maruch on comp.lang.python