searching pdf files for certain info

Kartic removethis.kartic.krishnamurthy at gmail.com
Tue Feb 22 18:30:09 EST 2005


rbt said the following on 2/22/2005 8:53 AM:
> Not really a Python question... but here goes: Is there a way to read 
> the content of a PDF file and decode it with Python? I'd like to read 
> PDF's, decode them, and then search the data for certain strings.
> 
> Thanks, rbt

Hi,

Try pdftotext which is part of the XPdf project. pdftotext extracts 
textual information from a PDF file to an output text file of your 
choice. I have used it in the past (not with Python) to do what you are 
attempting. It is a small program and you can invoke from python and 
search for the string/pattern you want.

You can download for your OS from:
http://www.foolabs.com/xpdf/download.html

Thanks,
-Kartic



More information about the Python-list mailing list