New newbie question.

John Hunter jdhunter at nitace.bsd.uchicago.edu
Tue Jul 9 15:55:24 EDT 2002


>>>>> "SA" == SA  <sarmstrong13 at mac.com> writes:

    SA> Can you read a pdf with Python?  I know you can read a text
    SA> file with:

    SA> Inp = open("textfile", "r")

    SA> Will the same thing work on pdf files:

    SA> Inp = open("pdffile", "rb")

You can do this, but you'll get the binary

If you are on a linux system, you may have pdftotext already
installed, and can call that command from python with

# Example usage: 
#    python ~/python/examples/pdf_demo.py HunterEtal2000.pdf

import os, sys

filename = sys.argv[1]
command = os.popen('pdftotext %s -' % filename)
for line in command.readlines():
    print line,


You may want to have a look at these two python apps that for working
with pdfs:

 http://www.reportlab.com/index.html - emphasis on pdf generation

 http://pdfsearch.sourceforge.net - search pdfs

Cheers,
John Hunter



More information about the Python-list mailing list