[Python-bugs-list] [ python-Bugs-680789 ] repr() of large array objects takes quadratic time

SourceForge.net noreply@sourceforge.net
Thu, 06 Feb 2003 17:40:41 -0800


Bugs item #680789, was opened at 2003-02-05 06:00
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680789&group_id=5470

Category: Python Library
Group: Python 2.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Jurjen N.E. Bos (jneb)
>Assigned to: Nobody/Anonymous (nobody)
Summary: repr() of large array objects takes quadratic time

Initial Comment:
This is a bug and a partial patch.

If I debug a program that contains a ridiculously large array (8M entries in my case), the debugger takes forever.
It happens in Mac OS X, Python 2.2, but I found the bug in is the repr module, so it is probably universal.
The thing is, that after the fix below, it still doesn't work!
Did I miss something trivial (like repr is builtin, or something like that?). Would someone with Mac OS X experience help out here, please (Jack?).

Here's the diff to make repr.repr work with large arrays:
13a14
>       self.maxarray = 5
50a52,62
>     def repr_array(self, x, level):
>         n = len(x)
>       header = "array('"+x.typecode+"', ["
>         if n == 0: return header+"])"
>         if level <= 0: return header+"...])"
>         s = ''
>         for i in range(min(n, self.maxarray)):
>             if s: s = s + ', '
>             s = s + self.repr1(x[i], level-1)
>         if n > self.maxarray: s = s + ', ...'
>         return header + s + "])"

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-02-06 20:40

Message:
Logged In: YES 
user_id=31435

I can't make time for this, so unassigned it.  It would make 
a good, brief project for someone -- the list and dict 
tp_reprs are linear-time, and tp_repr for array.array 
objects shouldn't be any harder than they were.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2003-02-06 16:37

Message:
Logged In: YES 
user_id=45365

Okay, so the real bug is that tp_repr of array objects takes quadratic time.

I'm changing the summary of this report then, and assigning back to you (Tim), on the basis that you did more checkins on arraymodule than I did. Feel free to pass the potato on:-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-02-05 18:08

Message:
Logged In: YES 
user_id=31435

pdb does import repr.py, but probably doesn't use it in 
whatever way Jurjen is using to display his big array.

WRT that, note that Jurjen is using array.array objects, not 
lists.  The internal array.array tp_repr slot is quadratic-time in 
the size of the array, while list's tp_repr is linear time.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2003-02-05 17:40

Message:
Logged In: YES 
user_id=45365

The fix is fine (it works for me the same way as for Tim), but I think we're shooting past the problem here.

First, pdb doesn't use repr.repr(), it uses the normal builtin repr().

Second, I don't see any sluggishness in pdb with large arrays. I tried
debugging
def foo():
    a = range(8000000)
and there was no problem. Allocating the object took a bit of time, yes, and if you actually try to print it you'll stare at about 800K lines filled with digits scrolling over your screen, but that is to be expected.

Could it be your sluggishness is coming from something else? For example, MacOSX starts behaving *very* badly if your root disk is full, because then it can't allocate swap space, and due to its optimistic behaviour it comes to a grinding halt.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-02-05 13:36

Message:
Logged In: YES 
user_id=31435

Nice to see you, Jurgen!  I checked this into current CVS, 
and it works fine for me in isolation:

>>> len(a)
11055060
>>> repr.repr(a)
"array('i', [0, 1, 2, 3, 4, ...])"
>>>

That goes in an eyeblink.  So more detail is needed about 
what "it still doesn't work!" means.  Assigned to Jack, and he 
can use current CVS to try it.

Lib/repr.py; new revision: 1.15
Lib/test/test_repr.py; new revision: 1.16
Misc/NEWS; new revision: 1.642

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680789&group_id=5470