searching pdf files for certain info
Follower
follower at gmail.com
Thu Feb 24 19:35:52 EST 2005
rbt <rbt at athop1.ath.vt.edu> wrote in message news:<cvfdgg$fr0$1 at solaris.cc.vt.edu>...
> Not really a Python question... but here goes: Is there a way to read
> the content of a PDF file and decode it with Python? I'd like to read
> PDF's, decode them, and then search the data for certain strings.
I've had success with both:
<http://www.boddie.org.uk/david/Projects/Python/pdftools/>
<http://www.adaptive-enterprises.com.au/~d/software/pdffile/pdffile.py>
although my preference is for the latter as it transparently handles
decryption. (I've previously posted an enhancement to the `pdftools`
utility that adds decryption handling to it, but now use the `pdffile`
library as it handles it better.)
The ease of text extraction depends a lot on how the PDFs have been
created.
--Phil.
More information about the Python-list
mailing list