[Patches] [ python-Patches-1281707 ] Speed up gzip.readline (~40%)
SourceForge.net
noreply at sourceforge.net
Mon May 22 17:12:12 CEST 2006
Patches item #1281707, was opened at 2005-09-04 13:53
Message generated for change (Comment added) made by etrepum
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1281707&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
>Status: Pending
>Resolution: None
Priority: 5
Submitted By: April King (marumari)
>Assigned to: Bob Ippolito (etrepum)
Summary: Speed up gzip.readline (~40%)
Initial Comment:
See bug 849046 for history. This patch passes both the
regression test and the standard test. Hopefully the
extra information below won't be too difficult to read.
I can attach this info to the bug, if need be.
Fixed:
- Add self.min_readsize to __init__.
Follows the principal that lines are likely to be
the same length in size,
and doesn't start over at a minimum length string
every call to readline()
- Rewriting of assignment for readsize and size at
the beginning of function.
Eliminates almost all calls to min()
- Change bufs to a string, and not an array. No
point in using an array when
all you do with it is "".join(bufs). Uses string
addition instead.
- Remove extra assignments to bufs (in return())
- Changes readline() to be much more readable (loop
reordering, more comments)
Recommendations:
- Delete _unread() function. It is used _only_ by
readline(), and moving its
functionality into readline() itself saves the
function call overhead.
_unread() is only 3 lines long. Testing shows that
removing it speeds
readline() up by about 3%. Backwards compatibility
concerns?
Testing results:
test_append (__main__.TestGzip) ... ok
test_many_append (__main__.TestGzip) ... ok
test_mode (__main__.TestGzip) ... ok
test_read (__main__.TestGzip) ... ok
test_readline (__main__.TestGzip) ... ok
test_readlines (__main__.TestGzip) ... ok
test_seek_read (__main__.TestGzip) ... ok
test_seek_write (__main__.TestGzip) ... ok
test_write (__main__.TestGzip) ... ok
----------------------------------------------------------------------
Ran 9 tests in 0.331s
Regression tests:
python regrtest.py -g test_gzip.py
test_gzip
1 test OK.
---
Profiling Results (performed on a common compressed log
file - 200748 lines).
With patch...
1213961 function calls in 12.188 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(close)
1159 0.020 0.000 0.020 0.000 :0(crc32)
1158 0.100 0.000 0.100 0.000
:0(decompress)
1 0.000 0.000 0.000 0.000
:0(decompressobj)
200774 0.812 0.000 0.812 0.000 :0(find)
403865 0.902 0.000 0.902 0.000 :0(len)
1183 0.000 0.000 0.000 0.000 :0(min)
2 0.000 0.000 0.000 0.000 :0(ord)
1173 0.000 0.000 0.000 0.000 :0(read)
12 0.000 0.000 0.000 0.000 :0(seek)
1 0.000 0.000 0.000 0.000
:0(setprofile)
18 0.000 0.000 0.000 0.000 :0(tell)
2 0.000 0.000 0.000 0.000 :0(unpack)
1 0.000 0.000 12.188 12.188 <string>:1(?)
1 0.000 0.000 0.000 0.000
gzip_new.py:156(_init_read)
1 0.000 0.000 0.000 0.000
gzip_new.py:160(_read_gzip_header)
3 0.000 0.000 0.000 0.000
gzip_new.py:18(U32)
200774 2.453 0.000 2.593 0.000
gzip_new.py:207(read)
200749 2.894 0.000 3.796 0.000
gzip_new.py:239(_unread)
1166 0.010 0.000 0.140 0.000
gzip_new.py:244(_read)
1 0.000 0.000 0.000 0.000
gzip_new.py:27(LOWU32)
1158 0.010 0.000 0.030 0.000
gzip_new.py:294(_add_read_data)
1 0.000 0.000 0.000 0.000
gzip_new.py:300(_read_eof)
1 0.000 0.000 0.000 0.000
gzip_new.py:314(close)
1 0.000 0.000 0.000 0.000
gzip_new.py:327(__del__)
200749 3.916 0.000 11.117 0.000
gzip_new.py:384(readline)
2 0.000 0.000 0.000 0.000
gzip_new.py:39(read32)
1 0.000 0.000 0.000 0.000
gzip_new.py:42(open)
1 0.000 0.000 0.000 0.000
gzip_new.py:60(__init__)
1 0.000 0.000 12.188 12.188
profile:0(gunzip_gzip_new_open())
0 0.000 0.000
profile:0(profiler)
1 1.071 1.071 12.188 12.188
test_gzip_speed.py:14(gunzip_gzip_new_open)
Without patch...
2073328 function calls in 18.597 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall
filename:lineno(function)
243820 0.735 0.000 0.735 0.000 :0(append)
1 0.000 0.000 0.000 0.000 :0(close)
1159 0.040 0.000 0.040 0.000 :0(crc32)
1158 0.100 0.000 0.100 0.000
:0(decompress)
1 0.000 0.000 0.000 0.000
:0(decompressobj)
243820 0.960 0.000 0.960 0.000 :0(find)
200749 0.801 0.000 0.801 0.000 :0(join)
489958 1.330 0.000 1.330 0.000 :0(len)
243820 0.791 0.000 0.791 0.000 :0(min)
2 0.000 0.000 0.000 0.000 :0(ord)
1173 0.030 0.000 0.030 0.000 :0(read)
6 0.000 0.000 0.000 0.000 :0(seek)
1 0.000 0.000 0.000 0.000
:0(setprofile)
6 0.000 0.000 0.000 0.000 :0(tell)
2 0.000 0.000 0.000 0.000 :0(unpack)
1 0.000 0.000 18.597 18.597 <string>:1(?)
1 0.000 0.000 0.000 0.000
gzip.py:154(_init_read)
1 0.000 0.000 0.000 0.000
gzip.py:158(_read_gzip_header)
3 0.000 0.000 0.000 0.000
gzip.py:18(U32)
243820 2.711 0.000 2.921 0.000
gzip.py:205(read)
200749 3.083 0.000 4.143 0.000
gzip.py:237(_unread)
1160 0.010 0.000 0.210 0.000
gzip.py:242(_read)
1 0.000 0.000 0.000 0.000
gzip.py:27(LOWU32)
1158 0.030 0.000 0.070 0.000
gzip.py:292(_add_read_data)
1 0.000 0.000 0.000 0.000
gzip.py:298(_read_eof)
1 0.000 0.000 0.000 0.000
gzip.py:312(close)
1 0.000 0.000 0.000 0.000
gzip.py:325(__del__)
200749 6.934 0.000 17.555 0.000
gzip.py:379(readline)
2 0.000 0.000 0.000 0.000
gzip.py:39(read32)
1 0.000 0.000 0.000 0.000
gzip.py:42(open)
1 0.000 0.000 0.000 0.000
gzip.py:59(__init__)
1 0.000 0.000 18.597 18.597
profile:0(gunzip_gzip_open())
0 0.000 0.000
profile:0(profiler)
1 1.042 1.042 18.597 18.597
test_gzip_speed.py:7(gunzip_gzip_open)
Using popen + gunzip -c...
200754 function calls in 4.338 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(popen)
200749 3.578 0.000 3.578 0.000 :0(readline)
1 0.000 0.000 0.000 0.000
:0(setprofile)
1 0.240 0.240 4.338 4.338 <string>:1(?)
1 0.000 0.000 4.338 4.338
profile:0(gunzip_popen())
0 0.000 0.000
profile:0(profiler)
1 0.520 0.520 4.098 4.098
test_gzip_speed.py:21(gunzip_popen)
----------------------------------------------------------------------
>Comment By: Bob Ippolito (etrepum)
Date: 2006-05-22 11:12
Message:
Logged In: YES
user_id=139309
I'm reopening this patch -- it seems that these changes have made parsing
Apache style log files MUCH slower (4x on some samples).
----------------------------------------------------------------------
Comment By: Bob Ippolito (etrepum)
Date: 2006-05-22 10:32
Message:
Logged In: YES
user_id=139309
Applied in revision 46070
----------------------------------------------------------------------
Comment By: Bob Ippolito (etrepum)
Date: 2006-05-22 10:11
Message:
Logged In: YES
user_id=139309
This patch is about a 30% win on Mac OS X i386 using this benchmark:
http://svn.python.org/view/sandbox/trunk/gzipbench/gzipbench.py
I'm going to look to see if there's any other low hanging fruit in there before I
commit.
----------------------------------------------------------------------
Comment By: April King (marumari)
Date: 2005-09-04 13:57
Message:
Logged In: YES
user_id=747439
See attached text file for the detailed description (that's
much more readable).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1281707&group_id=5470
More information about the Patches
mailing list