PDF->Text converter/extractor
Igor Stroh
igor.stroh at wohnheim.uni-ulm.de
Mon Nov 5 15:42:25 EST 2001
Hi there,
has someone ever tried to extract text from a PDF with python?
So far, there are 2 alternatives, but none of them satisfies my needs
(GPL license (or the like), speed and reliability):
1) Using pdftotext (Xpdf) with usual files
2) Using commerical PageCatcher from reportlab.com (1000 bucks per
license lol) directly in a python script (no files opened)
though I didnt find anything yet, perhaps there is someone who already
had the same problem and solved it by writing an own PDF parser? :) I'm
too lazy to start reading the specs of PDF and try to write the thingy by
myself :)
TIA,
Igor
More information about the Python-list
mailing list