[Tutor] multiprocessing question
Albert-Jan Roskam
fomcl at yahoo.com
Sun Nov 23 23:30:13 CET 2014
Hi
I created some code to get records from a potentially giant .csv file. This implements a __getitem__ method that gets records from a memory-mapped csv file. In order for this to work, I need to build a lookup table that maps line numbers to line starts/ends. This works, BUT building the lookup table could be time-consuming (and it freezes up the app). The (somewhat pruned) code is here: http://pastebin.com/0x6JKbfh. Now I would like to build the lookup table in a separate process. I used multiprocessing. In the crude example below, it appears to be doing what I have in mind. Is this the way to do it? I have never used multiprocessing/threading before, apart from playing around. One specfic question: __getitem__ is supposed to throw an IndexError when needed. But how do I know when I should do this if I don't yet know the total number of records? If there an uever cheap way of doing getting this number?
import multiprocessing as mp
import time
class Test(object):
"""Get records from a potentially huge, therefore mmapped, file (.csv)"""
def __init__(self): self.lookup = mp.Manager().dict()
self.lookup_done = False
process = mp.Process(target=self.create_lookup, args=(self.lookup,))
process.start()
def create_lookup(self, d):
"""time-consuming function that is only called once"""
for i in xrange(10 ** 7):
d[i] = i
process.join()
self.lookup_done = True
def __getitem__(self, key):
"""In my app, this returns a record from a memory-mapped file
The key is the line number, the value is a two-tuple of the start
and the end byte of that record"""
try:
return self.lookup[key]
except KeyError:
# what's a cheap way to calculate the number of records in a .csv file?
self.total_number_of_records = 10 ** 7
if key > self.total_number_of_records:
if not self.lookup_done:
process.join()
raise IndexError("index out of range")
print "One moment please, lookup not yet ready enough"
if __name__== "__main__":
test = Test()
# check if it works
while True:
k = int(raw_input("enter key: "))
try:
print "value is ", test[k]
time.sleep(1)
except KeyError:
print "OOPS, value not yet in lookup"
print "Max key is now", max(test.lookup.keys())
if test.lookup and max(test.lookup.keys()) == (10 ** 7 - 1):
print "Exiting"
break
print "Done"
Thank you in advance!
Regards,
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the Tutor
mailing list