[Python-Dev] Re: [Bug #121013] Bug in <stringobject>.join(<unicodestring>)

Michael Hudson mwh21@cam.ac.uk
28 Nov 2000 08:52:00 +0000


"M.-A. Lemburg" <mal@lemburg.com> writes:

> Michael Hudson wrote:
> > 
> > "M.-A. Lemburg" <mal@lemburg.com> writes:
> > 
> > > > Date: 2000-Nov-27 10:12
> > > > By: mwh
> > > >
> > > > Comment:
> > > > I hope you're all suitably embarrassed - please see patch #102548 for the trivial fix...
> > >
> > > Hehe, that was indeed a trivial patch. What was that about trees
> > > in a forest...
> > 
> > The way I found it was perhaps instructive.  I was looking at the
> > function, and thought "that's a bit complicated" so I rewrote it (My
> > rewrite also seems to be bit quicker so I'll upload it as soon as make
> > test has finished[*]).  In the course of rewriting it, I saw the line
> > my patch touched and went "duh!".
> 
> Yeah. The bug must have sneaked in there when the function was
> updated to the PySequence_Fast_* implementation.
> 
> BTW, could you also add a patch for the test_string.py and
> test_unicode.py tests ?

Here's an effort, but it's a memory scribbling bug.  If I change
test_unicode.py thus:

Index: test_unicode.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_unicode.py,v
retrieving revision 1.22
diff -c -r1.22 test_unicode.py
*** test_unicode.py     2000/10/23 17:22:08     1.22
--- test_unicode.py     2000/11/28 08:49:11
***************
*** 70,76 ****
  
  # join now works with any sequence type
  class Sequence:
!     def __init__(self): self.seq = 'wxyz'
      def __len__(self): return len(self.seq)
      def __getitem__(self, i): return self.seq[i]
  
--- 70,76 ----
  
  # join now works with any sequence type
  class Sequence:
!     def __init__(self): self.seq = [u'w',u'x',u'y',u'z']
      def __len__(self): return len(self.seq)
      def __getitem__(self, i): return self.seq[i]
  
***************
*** 78,83 ****
--- 78,87 ----
  test('join', u'', u'abcd', (u'a', u'b', u'c', u'd'))
  test('join', u' ', u'w x y z', Sequence())
  test('join', u' ', TypeError, 7)
+ test('join', ' ', u'a b c d', [u'a', u'b', u'c', u'd'])
+ test('join', '', u'abcd', (u'a', u'b', u'c', u'd'))
+ test('join', ' ', u'w x y z', Sequence())
+ test('join', ' ', TypeError, 7)
  
  class BadSeq(Sequence):
      def __init__(self): self.seq = [7, u'hello', 123L]

and back out the fix for the join bug, this happens:

...
...
Testing Unicode formatting strings... done.
Testing builtin codecs...
Traceback (most recent call last):
  File "test_unicode.py", line 378, in ?
    assert unicode('hello','utf8') == u'hello'
  File "/usr/local/src/python/dist/build/Lib/encodings/__init__.py", line 30, in
 ?
    import codecs,aliases
SystemError: compile.c:185: bad argument to internal function
Segmentation fault

i.e. it crashes miles away from the problem.

I'll reply to the other stuff later - no time now.

Cheers,
M.

-- 
  The only problem with Microsoft is they just have no taste.
              -- Steve Jobs, (From _Triumph of the Nerds_ PBS special)
                         and quoted by Aahz Maruch on comp.lang.python