Issue 680789: repr() of large array objects takes quadratic time (original) (raw)

Issue680789

Created on 2003-02-05 11:00 by jneb, last changed 2022-04-10 16:06 by admin. This issue is now closed.

Messages (10)
msg14447 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2003-02-05 11:00
This is a bug and a partial patch. If I debug a program that contains a ridiculously large array (8M entries in my case), the debugger takes forever. It happens in Mac OS X, Python 2.2, but I found the bug in is the repr module, so it is probably universal. The thing is, that after the fix below, it still doesn't work! Did I miss something trivial (like repr is builtin, or something like that?). Would someone with Mac OS X experience help out here, please (Jack?). Here's the diff to make repr.repr work with large arrays: 13a14 > self.maxarray = 5 50a52,62 > def repr_array(self, x, level): > n = len(x) > header = "array('"+x.typecode+"', [" > if n == 0: return header+"])" > if level <= 0: return header+"...])" > s = '' > for i in range(min(n, self.maxarray)): > if s: s = s + ', ' > s = s + self.repr1(x[i], level-1) > if n > self.maxarray: s = s + ', ...' > return header + s + "])"
msg14448 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-05 18:36
Logged In: YES user_id=31435 Nice to see you, Jurgen! I checked this into current CVS, and it works fine for me in isolation: >>> len(a) 11055060 >>> repr.repr(a) "array('i', [0, 1, 2, 3, 4, ...])" >>> That goes in an eyeblink. So more detail is needed about what "it still doesn't work!" means. Assigned to Jack, and he can use current CVS to try it. Lib/repr.py; new revision: 1.15 Lib/test/test_repr.py; new revision: 1.16 Misc/NEWS; new revision: 1.642
msg14449 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2003-02-05 22:40
Logged In: YES user_id=45365 The fix is fine (it works for me the same way as for Tim), but I think we're shooting past the problem here. First, pdb doesn't use repr.repr(), it uses the normal builtin repr(). Second, I don't see any sluggishness in pdb with large arrays. I tried debugging def foo(): a = range(8000000) and there was no problem. Allocating the object took a bit of time, yes, and if you actually try to print it you'll stare at about 800K lines filled with digits scrolling over your screen, but that is to be expected. Could it be your sluggishness is coming from something else? For example, MacOSX starts behaving *very* badly if your root disk is full, because then it can't allocate swap space, and due to its optimistic behaviour it comes to a grinding halt.
msg14450 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-05 23:08
Logged In: YES user_id=31435 pdb does import repr.py, but probably doesn't use it in whatever way Jurjen is using to display his big array. WRT that, note that Jurjen is using array.array objects, not lists. The internal array.array tp_repr slot is quadratic-time in the size of the array, while list's tp_repr is linear time.
msg14451 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2003-02-06 21:37
Logged In: YES user_id=45365 Okay, so the real bug is that tp_repr of array objects takes quadratic time. I'm changing the summary of this report then, and assigning back to you (Tim), on the basis that you did more checkins on arraymodule than I did. Feel free to pass the potato on:-)
msg14452 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2003-02-07 01:40
Logged In: YES user_id=31435 I can't make time for this, so unassigned it. It would make a good, brief project for someone -- the list and dict tp_reprs are linear-time, and tp_repr for array.array objects shouldn't be any harder than they were.
msg14453 - (view) Author: Grant Olson (logistix) Date: 2003-02-12 01:43
Logged In: YES user_id=699438 arraymodule's repr used "string += ',' + el" for each element in the array. Lists and dicts did a string.join to build the repr. Attached patch builds a tuple of array elements and then joins them. (actually for some reason I can't attach now, I'll post the patch in patch manager) This fixes the time issue, but I don't know enough about how you guys manage memory in each case to tell what impact that'll have on really, really big arrays (although I imagine it takes more memory).
msg14454 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2003-04-23 17:33
Logged In: YES user_id=80475 Fixed this up by converting the array to a list and then using the list object's efficient repr(). See Modules/arraymodule.c 2.87. Since I categorize this as a performance issue and not a bug, I've applied the fix to Py2.3 but am not recommending for backport.
msg14455 - (view) Author: Jurjen N.E. Bos (jneb) * Date: 2003-04-25 09:32
Logged In: YES user_id=446428 The debugger I use, is not pdb, but the Mac only IDE debugger. I thought this was only a front end on pdb, but it apparently is not. It seems that it is still slow in 2.3. (I can't check it at the moment, I am running a multiple hour computation...) May it will automatically be fixed if Jack manages to get IDLE working on the Mac...
msg14456 - (view) Author: Jack Jansen (jackjansen) * (Python committer) Date: 2003-04-29 21:51
Logged In: YES user_id=45365 Jurjen, could you submit a separate bug report for the MacPython IDE? It needs a different solution (and I'm not going to come around to it soon, so otherwise I may forget).
History
Date User Action Args
2022-04-10 16:06:32 admin set github: 37911
2003-02-05 11:00:07 jneb create