Extracting images from a PDF file

Carl K carl at personnelware.com
Thu Dec 27 01:12:11 EST 2007


Doug Farrell wrote:
> Hi all,
> 
> Does anyone know how to extract images from a PDF file? What I'm looking
> to do is use pdflib_py to open large PDF files on our Linux servers,
> then use PIL to verify image data. I want to do this in order
> to find corrupt images in the PDF files. If anyone could help
> me out, or point me in the right direction, it would be most
> appreciated!
> 

If you are ok shelling out to a binary:

pdfimages  -  Portable  Document  Format (PDF) image extractor (version
        3.00)
http://packages.ubuntu.com/gutsy/text/xpdf-utils

I am trying to convert the pdf to a png, but without having to run external 
commands.  so I will understand if you arn't happy with pdfimages.

Carl K



More information about the Python-list mailing list