multiprocessing module and matplotlib.pyplot/PdfPages

Rob Gaddi rgaddi at technologyhighland.invalid
Tue Apr 21 13:31:30 EDT 2015


On Tue, 21 Apr 2015 03:14:09 +0100, Paulo da Silva wrote:

> I have program that generates about 100 relatively complex graphics and
> writes then to a pdf book.
> It takes a while!
> Is there any possibility of using multiprocessing to build the graphics
> and then use several calls to savefig(), i.e. some kind of graphic's
> objects?
> 
> Thanks for any help/comments.

That sounds pretty reasonable.  Just be sure to explicitly close each 
figure once you're done with it.  Matplotlib figures take up a shocking 
amount of memory; nothing slows your system to a horrendous crawl like 
having to resort to swapping to disk.

One thing that would be a bit worrisome is managing order, since I'm 
assuming you have some order that you want the pages to be written in, 
and spawning things off to multiple processes creates a chokepoint where 
you'd need to hold things off and start them again in order to drop them 
into the PdfPages correctly.  Maybe you'd get some boost from a 
ProcessPoolExecutor, but maybe not.

That's where I like ChrisA's solution of having the various processes 
(whether they're spawned from the same thing or not) just each be 
responsible for writing out their own figures out to disk, one page per 
file, and then use something like pdftk to stitch them all together after 
the fact.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.



More information about the Python-list mailing list