searching pdf files for certain info
Kartic
removethis.kartic.krishnamurthy at gmail.com
Tue Feb 22 18:30:09 EST 2005
rbt said the following on 2/22/2005 8:53 AM:
> Not really a Python question... but here goes: Is there a way to read
> the content of a PDF file and decode it with Python? I'd like to read
> PDF's, decode them, and then search the data for certain strings.
>
> Thanks, rbt
Hi,
Try pdftotext which is part of the XPdf project. pdftotext extracts
textual information from a PDF file to an output text file of your
choice. I have used it in the past (not with Python) to do what you are
attempting. It is a small program and you can invoke from python and
search for the string/pattern you want.
You can download for your OS from:
http://www.foolabs.com/xpdf/download.html
Thanks,
-Kartic
More information about the Python-list
mailing list