[Python-Dev] Re: [Bug #121013] Bug in <stringobject>.join(<unicodestring>)

M.-A. Lemburg mal@lemburg.com
Tue, 28 Nov 2000 10:17:29 +0100


Michael Hudson wrote:
> 
> "M.-A. Lemburg" <mal@lemburg.com> writes:
> 
> > Michael Hudson wrote:
> > >
> > > "M.-A. Lemburg" <mal@lemburg.com> writes:
> > >
> > > > > Date: 2000-Nov-27 10:12
> > > > > By: mwh
> > > > >
> > > > > Comment:
> > > > > I hope you're all suitably embarrassed - please see patch #102548 for the trivial fix...
> > > >
> > > > Hehe, that was indeed a trivial patch. What was that about trees
> > > > in a forest...
> > >
> > > The way I found it was perhaps instructive.  I was looking at the
> > > function, and thought "that's a bit complicated" so I rewrote it (My
> > > rewrite also seems to be bit quicker so I'll upload it as soon as make
> > > test has finished[*]).  In the course of rewriting it, I saw the line
> > > my patch touched and went "duh!".
> >
> > Yeah. The bug must have sneaked in there when the function was
> > updated to the PySequence_Fast_* implementation.
> >
> > BTW, could you also add a patch for the test_string.py and
> > test_unicode.py tests ?
> 
> Here's an effort, but it's a memory scribbling bug.  If I change
> test_unicode.py thus:
> 
> Index: test_unicode.py
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Lib/test/test_unicode.py,v
> retrieving revision 1.22
> diff -c -r1.22 test_unicode.py
> *** test_unicode.py     2000/10/23 17:22:08     1.22
> --- test_unicode.py     2000/11/28 08:49:11
> ***************
> *** 70,76 ****
> 
>   # join now works with any sequence type
>   class Sequence:
> !     def __init__(self): self.seq = 'wxyz'
>       def __len__(self): return len(self.seq)
>       def __getitem__(self, i): return self.seq[i]
> 
> --- 70,76 ----
> 
>   # join now works with any sequence type
>   class Sequence:
> !     def __init__(self): self.seq = [u'w',u'x',u'y',u'z']
>       def __len__(self): return len(self.seq)
>       def __getitem__(self, i): return self.seq[i]
> 
> ***************
> *** 78,83 ****
> --- 78,87 ----
>   test('join', u'', u'abcd', (u'a', u'b', u'c', u'd'))
>   test('join', u' ', u'w x y z', Sequence())
>   test('join', u' ', TypeError, 7)
> + test('join', ' ', u'a b c d', [u'a', u'b', u'c', u'd'])
> + test('join', '', u'abcd', (u'a', u'b', u'c', u'd'))
> + test('join', ' ', u'w x y z', Sequence())
> + test('join', ' ', TypeError, 7)
> 
>   class BadSeq(Sequence):
>       def __init__(self): self.seq = [7, u'hello', 123L]
> 
> and back out the fix for the join bug, this happens:
> 
> ...
> ...
> Testing Unicode formatting strings... done.
> Testing builtin codecs...
> Traceback (most recent call last):
>   File "test_unicode.py", line 378, in ?
>     assert unicode('hello','utf8') == u'hello'
>   File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 30, in
>  ?
>     import codecs,aliases
> SystemError: compile.c:185: bad argument to internal function
> Segmentation fault
> 
> i.e. it crashes miles away from the problem.

The test is only supposed to assure that we don't trip again. It's
not intended to work in some way *before* applying your patch.

I always try to integrate tests for bugs into the test suites
for my mx stuff and AFAICTL this also seems to be the Python
dev style.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/