PDF->Text converter/extractor

Igor Stroh igor.stroh at wohnheim.uni-ulm.de
Mon Nov 5 15:42:25 EST 2001


Hi there,

has someone ever tried to extract text from a PDF with python?
So far, there are 2 alternatives, but none of them satisfies my needs
(GPL license (or the like), speed and reliability):
1) Using pdftotext (Xpdf) with usual files
2) Using commerical PageCatcher from reportlab.com (1000 bucks per
license lol) directly in a python script (no files opened)

though I didnt find anything yet, perhaps there is someone who already
had the same problem and solved it by writing an own PDF parser? :) I'm
too lazy to start reading the specs of PDF and try to write the thingy by
myself :)

TIA,
Igor



More information about the Python-list mailing list