problem with multiprocessing and defaultdict

Tue Jan 12 05:48:49 EST 2010

Robert Kern wrote:

> On 2010-01-11 17:50 PM, wiso wrote:
> 
>> The problem now is this:
>> start reading file r1_200909.log
>> start reading file r1_200910.log
>> readen 488832 lines from file r1_200910.log
>> readen 517247 lines from file r1_200909.log
>>
>> with huge file (the real case) the program freeze. Is there a solution to
>> avoid pickling/serialization, ... for example something like this:
>>
>> if __name__ == "__main__":
>>      file_names = ["r1_200909.log", "r1_200910.log"]
>>      pool = multiprocessing.Pool(len(file_names))
>>      childrens = [Container(f) for f in file_names]
>>      pool.map(lambda c: c.read(), childrens)
>>
>> PicklingError: Can't pickle<type 'function'>: attribute lookup
>> __builtin__.function failed
> 
> You can't pickle lambda functions.
> 
> What information do you actually need back from the workers?
> 

They sent back the object filled with data. The problem is very simple: I 
have a container, the container has a method read(file_name) that read a 
huge file and fill the container with datas. I have more then 1 file to read 
so I want to parallelize this process. The reading method is quite slow 
because it involves regex.