PDF: finding a blank image

Scott David Daniels Scott.Daniels at Acm.Org
Mon Jul 13 19:22:23 EDT 2009


DrLeif wrote:
> I have about 6000 PDF files which have been produced using a scanner
> with more being produced each day.  The PDF files contain old paper
> records which have been taking up space.   The scanner is set to
> detect when there is information on the backside of the page (duplex
> scan).  The problem of course is it's not the always reliable and we
> wind up with a number of PDF files containing blank pages.
> 
> What I would like to do is have python detect a "blank" pages in a PDF
> file and remove it.  Any suggestions?

I'd check into ReportLab's commercial product, it may well be easily
capable of that.  If no success, you might contact PJ at Groklaw, she
has dealt with a _lot_ of PDFs (and knows people who deal with PDFs
in bulk).

--Scott David Daniels
Scott.Daniels at Acm.Org



More information about the Python-list mailing list