Problem of function calls from map()

Dasn dasn at bluebottle.com
Mon Aug 21 21:39:07 EDT 2006


Thanks for your reply.

Well, please drop a glance at my current profile report:

#------------------------ test.py ---------------------
import os, sys, profile

print os.uname()
print sys.version

# size of 'dict.txt' is about 3.6M, 154563 lines
f = open('dict.txt', 'r')
print "Reading lines..."
lines = f.readlines()
print "Done."

def splitUsing(chars):
	def tmp(s):
		return s.split(chars)
	return tmp


def sp0(lines):
	"""====> sp0() -- Normal 'for' loop"""
	l = []
	for line in lines:
		l.append(line.split('\t'))
	return l

def sp1(lines):
	"""====> sp1() -- List-comprehension"""
	return [s.split('\t') for s in lines]

def sp2(lines):
	"""====> sp2() -- Map with lambda function"""
	return map(lambda s: s.split('\t'), lines)

def sp3(lines):
	"""====> sp3() -- Map with splitUsing() function"""
	return map(splitUsing('\t'), lines)

def sp4(lines):
	"""====> sp4() -- Not correct, but very fast"""
	return map(str.split, lines)

for num in xrange(5):
	fname = 'sp%(num)s' % locals()
	print eval(fname).__doc__
	profile.run(fname+'(lines)')

#---------------------------End of test.py ----------------

$ python test.py

('OpenBSD', 'Compaq', '3.9', 'kernel#1', 'i386')
2.4.2 (#1, Mar  2 2006, 14:17:22) 
[GCC 3.3.5 (propolice)]
Reading lines...
Done.
====> sp0() -- Normal 'for' loop
         309130 function calls in 20.510 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   154563    4.160    0.000    4.160    0.000 :0(append)
        1    0.010    0.010    0.010    0.010 :0(setprofile)
   154563    6.490    0.000    6.490    0.000 :0(split)
        1    0.380    0.380   20.500   20.500 <string>:1(?)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000   20.510   20.510 profile:0(sp0(lines))
        1    9.470    9.470   20.120   20.120 test.py:20(sp0)


====> sp1() -- List-comprehension
         154567 function calls in 12.240 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
   154563    6.740    0.000    6.740    0.000 :0(split)
        1    0.380    0.380   12.240   12.240 <string>:1(?)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000   12.240   12.240 profile:0(sp1(lines))
        1    5.120    5.120   11.860   11.860 test.py:27(sp1)


====> sp2() -- Map with lambda function
         309131 function calls in 20.480 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    4.600    4.600   20.100   20.100 :0(map)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
   154563    7.320    0.000    7.320    0.000 :0(split)
        1    0.370    0.370   20.470   20.470 <string>:1(?)
        0    0.000             0.000          profile:0(profiler)
        1    0.010    0.010   20.480   20.480 profile:0(sp2(lines))
        1    0.000    0.000   20.100   20.100 test.py:31(sp2)
   154563    8.180    0.000   15.500    0.000 test.py:33(<lambda>)


====> sp3() -- Map with splitUsing() function
         309132 function calls in 21.900 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    5.540    5.540   21.520   21.520 :0(map)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
   154563    7.100    0.000    7.100    0.000 :0(split)
        1    0.380    0.380   21.900   21.900 <string>:1(?)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000   21.900   21.900 profile:0(sp3(lines))
        1    0.000    0.000    0.000    0.000 test.py:14(splitUsing)
   154563    8.880    0.000   15.980    0.000 test.py:15(tmp)
        1    0.000    0.000   21.520   21.520 test.py:35(sp3)


====> sp4() -- Not correct, but very fast
         5 function calls in 3.090 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    2.660    2.660    2.660    2.660 :0(map)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.430    0.430    3.090    3.090 <string>:1(?)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    3.090    3.090 profile:0(sp4(lines))
        1    0.000    0.000    2.660    2.660 test.py:39(sp4)


The problem is the default behavior of str.split should be more complex
than str.split('\t'). If we could use the str.split('\t') in map(), the
result would be witty. What do u guys think?




More information about the Python-list mailing list