[Tutor] multiprocessing question

Albert-Jan Roskam fomcl at yahoo.com
Sun Nov 23 23:30:13 CET 2014


Hi


I created some code to get records from a potentially giant .csv file. This implements a __getitem__ method that gets records from a memory-mapped csv file. In order for this to work, I need to build a lookup table that maps line numbers to line starts/ends. This works, BUT building the lookup table could be time-consuming (and it freezes up the app). The (somewhat pruned) code is here: http://pastebin.com/0x6JKbfh. Now I would like to build the lookup table in a separate process. I used multiprocessing. In the crude example below, it appears to be doing what I have in mind. Is this the way to do it? I have never used multiprocessing/threading before, apart from playing around. One specfic question: __getitem__ is supposed to throw an IndexError when needed. But how do I know when I should do this if I don't yet know the total number of records? If there an uever cheap way of doing getting this number?


import multiprocessing as mp
import time

class Test(object):

    """Get records from a potentially huge, therefore mmapped, file (.csv)"""

    def __init__(self):        self.lookup = mp.Manager().dict()
        self.lookup_done = False
        process = mp.Process(target=self.create_lookup, args=(self.lookup,))
        process.start()

    def create_lookup(self, d):
        """time-consuming function that is only called once"""
        for i in xrange(10 ** 7):
            d[i] = i

        process.join()
        self.lookup_done = True

    def __getitem__(self, key):
        """In my app, this returns a record from a memory-mapped file
        The key is the line number, the value is a two-tuple of the start
        and the end byte of that record"""
        try:
            return self.lookup[key]
        except KeyError:
            # what's a cheap way to calculate the number of records in a .csv file?
            self.total_number_of_records = 10 ** 7

            if key > self.total_number_of_records:
                if not self.lookup_done:
                    process.join()
                raise IndexError("index out of range")
            print "One moment please, lookup not yet ready enough"

if __name__== "__main__":
    test = Test()

    # check if it works
    while True:
        k = int(raw_input("enter key: "))
        try:
            print "value is ", test[k]
            time.sleep(1)
        except KeyError:
            print "OOPS, value not yet in lookup"
        print "Max key is now", max(test.lookup.keys())
        if test.lookup and max(test.lookup.keys()) == (10 ** 7 - 1):
            print "Exiting"
            break
    print "Done"
 

Thank you in advance!

 
Regards,

Albert-Jan




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a 

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


More information about the Tutor mailing list