Reading PDF files .
Igor Stroh
igor.stroh at wohnheim.uni-ulm.de
Tue Nov 6 16:50:04 EST 2001
On Tue, 06 Nov 2001 20:05:56 +0100, "Martin von Loewis"
<loewis at informatik.hu-berlin.de> wrote:
> Amit Weisman <weismann at netvision.net.il> writes:
>
>> Is there a module for reading PDF files ?
>
> Please have a look at www.reportlab.com
reportlab module doesn't support reading PDFs, it's rather a PDF
generator, and I don't think that Amit is willing to pay $1000,- for the
PageCatcher :)
Amit, you might want to check the possibility to extract text with
pdftotext (it's distributed with the Xpdf package):
>>>from os import popen
>>>text = popen('pdftotext %s -' % <pdfFileName>).read()
'text' contains now only raw text data from specified PDF file, with a
whole bunch of control chars though... but since all you want is to look
up a word, this should be enough :)
HTH,
Igor
More information about the Python-list
mailing list