Reading PDF files .

Igor Stroh igor.stroh at wohnheim.uni-ulm.de
Tue Nov 6 16:50:04 EST 2001


On Tue, 06 Nov 2001 20:05:56 +0100, "Martin von Loewis"
<loewis at informatik.hu-berlin.de> wrote:

> Amit Weisman <weismann at netvision.net.il> writes:
> 
>> Is there a module for reading PDF files ?
> 
> Please have a look at www.reportlab.com

reportlab module doesn't support reading PDFs, it's rather a PDF
generator, and I don't think that Amit is willing to pay $1000,- for the
PageCatcher :)

Amit, you might want to check the possibility to extract text with
pdftotext (it's distributed with the Xpdf package):

>>>from os import popen
>>>text = popen('pdftotext %s -' % <pdfFileName>).read()

'text' contains now only raw text data from specified PDF file, with a
whole bunch of control chars though... but since all you want is to look
up a word, this should be enough :)

HTH,
Igor



More information about the Python-list mailing list