Problem of function calls from map()
Dasn
dasn at bluebottle.com
Mon Aug 21 21:39:07 EDT 2006
Thanks for your reply.
Well, please drop a glance at my current profile report:
#------------------------ test.py ---------------------
import os, sys, profile
print os.uname()
print sys.version
# size of 'dict.txt' is about 3.6M, 154563 lines
f = open('dict.txt', 'r')
print "Reading lines..."
lines = f.readlines()
print "Done."
def splitUsing(chars):
def tmp(s):
return s.split(chars)
return tmp
def sp0(lines):
"""====> sp0() -- Normal 'for' loop"""
l = []
for line in lines:
l.append(line.split('\t'))
return l
def sp1(lines):
"""====> sp1() -- List-comprehension"""
return [s.split('\t') for s in lines]
def sp2(lines):
"""====> sp2() -- Map with lambda function"""
return map(lambda s: s.split('\t'), lines)
def sp3(lines):
"""====> sp3() -- Map with splitUsing() function"""
return map(splitUsing('\t'), lines)
def sp4(lines):
"""====> sp4() -- Not correct, but very fast"""
return map(str.split, lines)
for num in xrange(5):
fname = 'sp%(num)s' % locals()
print eval(fname).__doc__
profile.run(fname+'(lines)')
#---------------------------End of test.py ----------------
$ python test.py
('OpenBSD', 'Compaq', '3.9', 'kernel#1', 'i386')
2.4.2 (#1, Mar 2 2006, 14:17:22)
[GCC 3.3.5 (propolice)]
Reading lines...
Done.
====> sp0() -- Normal 'for' loop
309130 function calls in 20.510 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
154563 4.160 0.000 4.160 0.000 :0(append)
1 0.010 0.010 0.010 0.010 :0(setprofile)
154563 6.490 0.000 6.490 0.000 :0(split)
1 0.380 0.380 20.500 20.500 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 20.510 20.510 profile:0(sp0(lines))
1 9.470 9.470 20.120 20.120 test.py:20(sp0)
====> sp1() -- List-comprehension
154567 function calls in 12.240 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :0(setprofile)
154563 6.740 0.000 6.740 0.000 :0(split)
1 0.380 0.380 12.240 12.240 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 12.240 12.240 profile:0(sp1(lines))
1 5.120 5.120 11.860 11.860 test.py:27(sp1)
====> sp2() -- Map with lambda function
309131 function calls in 20.480 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 4.600 4.600 20.100 20.100 :0(map)
1 0.000 0.000 0.000 0.000 :0(setprofile)
154563 7.320 0.000 7.320 0.000 :0(split)
1 0.370 0.370 20.470 20.470 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
1 0.010 0.010 20.480 20.480 profile:0(sp2(lines))
1 0.000 0.000 20.100 20.100 test.py:31(sp2)
154563 8.180 0.000 15.500 0.000 test.py:33(<lambda>)
====> sp3() -- Map with splitUsing() function
309132 function calls in 21.900 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 5.540 5.540 21.520 21.520 :0(map)
1 0.000 0.000 0.000 0.000 :0(setprofile)
154563 7.100 0.000 7.100 0.000 :0(split)
1 0.380 0.380 21.900 21.900 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 21.900 21.900 profile:0(sp3(lines))
1 0.000 0.000 0.000 0.000 test.py:14(splitUsing)
154563 8.880 0.000 15.980 0.000 test.py:15(tmp)
1 0.000 0.000 21.520 21.520 test.py:35(sp3)
====> sp4() -- Not correct, but very fast
5 function calls in 3.090 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 2.660 2.660 2.660 2.660 :0(map)
1 0.000 0.000 0.000 0.000 :0(setprofile)
1 0.430 0.430 3.090 3.090 <string>:1(?)
0 0.000 0.000 profile:0(profiler)
1 0.000 0.000 3.090 3.090 profile:0(sp4(lines))
1 0.000 0.000 2.660 2.660 test.py:39(sp4)
The problem is the default behavior of str.split should be more complex
than str.split('\t'). If we could use the str.split('\t') in map(), the
result would be witty. What do u guys think?
More information about the Python-list
mailing list