multiprocessing module and matplotlib.pyplot/PdfPages

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu Apr 23 17:39:53 EDT 2015


On 21 April 2015 at 16:53, Paulo da Silva
<p_s_d_a_s_i_l_v_a_ns at netcabo.pt> wrote:
> On 21-04-2015 11:26, Dave Angel wrote:
>> On 04/20/2015 10:14 PM, Paulo da Silva wrote:
>>> I have program that generates about 100 relatively complex graphics and
>>> writes then to a pdf book.
>>> It takes a while!
>>> Is there any possibility of using multiprocessing to build the graphics
>>> and then use several calls to savefig(), i.e. some kind of graphic's
>>> objects?
>>>
>>
> ...
>
>>
>> If the 100 plots are really quite independent, you could also consider
>> recruiting time from multiple machines.  As long as the data that needs
>> to go between them is not too large, it can pay off big time.
>>
> Sorry if I was not clear.
>
> Yes, I have 8 cores and the graphics' processes calculation are all
> independent. The problem I have is that if there is any way to generate
> independent figures in matplotlib. The logic seems to be build the
> graphic and save it. I was trying to know if there is any way to build
> graphic objects that can be built in parallel and, at the end, saved by
> the controller task.

Hi Paulo,

It sounds like you're using matplotlib's "stateful" API. This is a
convenience layer for interactive work so that you can  do something
like:

from pylab import *

plot([0, 1], [0, 1])
savefig('plot.pdf')

For normal code it is recommended to use the "object-oriented" API
which looks like:

from matplotlib.pyplot import figure

fig = figure(figsize=(4, 5))
ax = fig.add_axes([0.15, 0.15, 0.70, 0.70])
ax.plot([0, 1], [0, 1])
fig.savefig('plot.pdf')

When using this API it is entirely possible to create many figures in
parallel using e.g. multiprocessing.

I can't  find a good reference to explain this API but this  page mentions it:
http://matplotlib.org/faq/usage_faq.html

However each figure comes with a significant memory overhead and the
call to savefig can be the most CPU-intensive part so I  wouldn't
recommend to build a  list  of figures and savefig them at the end.
Another approach is to also save them in parallel as  1.pdf, 2.pdf
etc. then using  something like pdftk to merge the PDF pages at the
end.


Oscar



More information about the Python-list mailing list