searching pdf files for certain info

Follower follower at gmail.com
Thu Feb 24 19:35:52 EST 2005


rbt <rbt at athop1.ath.vt.edu> wrote in message news:<cvfdgg$fr0$1 at solaris.cc.vt.edu>...
> Not really a Python question... but here goes: Is there a way to read 
> the content of a PDF file and decode it with Python? I'd like to read 
> PDF's, decode them, and then search the data for certain strings.

I've had success with both:

  <http://www.boddie.org.uk/david/Projects/Python/pdftools/>

  <http://www.adaptive-enterprises.com.au/~d/software/pdffile/pdffile.py>

although my preference is for the latter as it transparently handles
decryption. (I've previously posted an enhancement to the `pdftools`
utility that adds decryption handling to it, but now use the `pdffile`
library as it handles it better.)

The ease of text extraction depends a lot on how the PDFs have been
created.

--Phil.



More information about the Python-list mailing list