PDF->Text converter/extractor

Alves, Carlos Alberto - Coelce calves at coelce.com.br
Tue Nov 6 09:31:28 EST 2001


I would appreciate one copy of that script. And also: is xpdf a module? If
so, where can I get it?!

Carlos Alberto
COELCE/DPRON-Departamento de Projetos e Obras Norte
Fone: 677- 2228
e-mail: calves at coelce.com.br
\|||/
(o o)
--ooo0-(_)-0ooo--



-----Original Message-----
From: Bruno Liénard [mailto:lienard.bruno at free.fr]
Sent: Monday, November 05, 2001 6:53 PM
To: python-list at python.org
Subject: Re: PDF->Text converter/extractor


I had written a script some time ago to extract directly from PDF file, it's
quite easy . As I had a very large volume of text  to extract (some giga of
text), I now use PDFTOTEXT which comes with XPDF. I slighly modify for my
needs. If you are interested, I will look for the script in my archives

Bruno Lienard

"Igor Stroh" <igor.stroh at wohnheim.uni-ulm.de> a écrit dans le message news:
3be6fa21$1 at sol.wohnheim.uni-ulm.de...
> Hi there,
>
> has someone ever tried to extract text from a PDF with python?
> So far, there are 2 alternatives, but none of them satisfies my needs
> (GPL license (or the like), speed and reliability):
> 1) Using pdftotext (Xpdf) with usual files
> 2) Using commerical PageCatcher from reportlab.com (1000 bucks per
> license lol) directly in a python script (no files opened)
>
> though I didnt find anything yet, perhaps there is someone who already
> had the same problem and solved it by writing an own PDF parser? :) I'm
> too lazy to start reading the specs of PDF and try to write the thingy by
> myself :)
>
> TIA,
> Igor


-- 
http://mail.python.org/mailman/listinfo/python-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20011106/39854fd0/attachment.html>


More information about the Python-list mailing list