PDF->Text converter/extractor

Tue Nov 6 09:43:08 EST 2001

Please put that script on Parnassus or somewhere similar. It would be very
helpful to alot of people it seems.

"Bruno Liénard" <lienard.bruno at free.fr> wrote in message
news:3be70a38$0$15115$626a54ce at news.free.fr...
> I had written a script some time ago to extract directly from PDF file,
it's
> quite easy . As I had a very large volume of text  to extract (some giga
of
> text), I now use PDFTOTEXT which comes with XPDF. I slighly modify for my
> needs. If you are interested, I will look for the script in my archives
>
> Bruno Lienard
>
> "Igor Stroh" <igor.stroh at wohnheim.uni-ulm.de> a écrit dans le message
news:
> 3be6fa21$1 at sol.wohnheim.uni-ulm.de...
> > Hi there,
> >
> > has someone ever tried to extract text from a PDF with python?
> > So far, there are 2 alternatives, but none of them satisfies my needs
> > (GPL license (or the like), speed and reliability):
> > 1) Using pdftotext (Xpdf) with usual files
> > 2) Using commerical PageCatcher from reportlab.com (1000 bucks per
> > license lol) directly in a python script (no files opened)
> >
> > though I didnt find anything yet, perhaps there is someone who already
> > had the same problem and solved it by writing an own PDF parser? :) I'm
> > too lazy to start reading the specs of PDF and try to write the thingy
by
> > myself :)
> >
> > TIA,
> > Igor
>
>