convert pdf to png

Diez B. Roggisch deets at nospam.web.de
Wed Dec 26 05:33:40 EST 2007


Carl K schrieb:
> Diez B. Roggisch wrote:
>> Carl K schrieb:
>>> Grant Edwards wrote:
>>>> On 2007-12-24, Carl K <carl at personnelware.com> wrote:
>>>>
>>>>>> If it is a multi page pdf Imagemagick will do:
>>>>>>
>>>>>> convert file.pdf page-%03d.png
>>>>> I need python code to do this.  It is going to be run on a
>>>>> someone else's shared host web server, security and
>>>>> performance is an issue.  So I would rather not run stuff via
>>>>> popen.
>>>>
>>>> Use subprocess.
>>>>
>>>> Trying to eliminate popen because of the overhead when running
>>>> ghostscript to render PDF (I assume convert uses gs?) is about
>>>> like trimming an elephants toenails to save weight.
>>>>
>>>
>>> maybe, but I wouldn't be so sure.
>>>
>>> currently the pdf is created in a python StringIO buffer and returned 
>>> to the browser;  so it never becomes a file.  using convert means I 
>>> have to first save it as a file, convert from file to file, read the 
>>> file, delete the 2 files. so 6 file operations where before there 
>>> were none.  That may be more of a load than the ghostscript part.
>>
>> So what? I'm not sure about current HD speeds, but a couple of years 
>> ago these were about 30MByte/s - and should be faster today. Which 
>> equals 240MBit/s, much more than your user's internet connection. and 
>> this is raw IO speed, not counting disk caches.
> 
> server is doing a ton of SQL queries (yes, moving to a 2nd box would be 
> nice. might happen mid 2008) so adding HD is an issue.  not sure how 
> much, but enough to try to avoid it.

Keeping stuff in memory provoking paging isn't?

>>
>> In other words: given the overall latency of a network connection, 
>> your  file operations shouldn't shave off more than a split-second. 
> 
> those split seconds can add up.  The server is aleady overloaded, so 
> adding more is a big no no.
> 
>  > So if you
>> _can_ go the subprocess-road, do it. It's the easiest way. And withou 
>> further knowledge of the GS-library (that you lack, as do I) - how do 
>> you know that it works "in memory", and doesn't actually expect a 
>> file-name or pointer?
> 
> I am willing to take that chance.  much better than the 6 hits I know 
> would happen using
> 
> I have a feeling if I have to create a file, we will go with plan B: 
> send the client a pdf and let the user deal with it.  Not as nice and 
> slick, but won't bog the server.

I have the feeling you just go by your feelings. Which is always a bad 
idea regarding performance bottlenecks.

http://en.wikipedia.org/wiki/Optimization_(computer_science)

So instead of jumping through hoops getting something done the hard way 
without knowing how the easy solution affects performance, implement the 
feature the easiest way. And SEE if it causes trouble.

Diez



More information about the Python-list mailing list