PDF->Text converter/extractor

Igor Stroh igor.stroh at wohnheim.uni-ulm.de
Mon Nov 5 17:14:29 EST 2001


On Mon, 05 Nov 2001 22:52:57 +0100, "Bruno Liénard"
<lienard.bruno at free.fr> wrote:

> I had written a script some time ago to extract directly from PDF file,
> it's quite easy . As I had a very large volume of text  to extract (some
> giga of text), I now use PDFTOTEXT which comes with XPDF. I slighly
> modify for my needs. If you are interested, I will look for the script
> in my archives

I'd greatly appreciate it :)
See, I can't use pdftotext since I have several thousands of PDFs to be
processed in a short amount of time... I think invoking pdftotext for each
file would be pretty slow... by the way, the pdf files are _not_ in the
filesystem, the whole stuff is located in a DB (ZopeDB), so I have some
kind of data objects rather then real files...



More information about the Python-list mailing list