PDF->Text converter/extractor

Bengt Richter bokr at accessone.com
Tue Nov 6 17:43:41 EST 2001


On Tue, 6 Nov 2001 11:31:28 -0300, "Alves, Carlos Alberto - Coelce" <calves at coelce.com.br> wrote:

>This message is in MIME format. Since your mail reader does not understand
>this format, some or all of this message may not be legible.
>
Some was also in HTML.

Would you and others similarly set up please switch to plain text, and maybe
do a little editing? E.g., the following 7 lines would have been more than
sufficient to convey your message:

+-------------------------------------------------
On Mon, 5 Nov 2001 22:52:57 +0100, "Bruno Liénard" <lienard.bruno at free.fr> wrote:

>I had written a script some time ago to extract directly from PDF file, it's
>quite easy . As I had a very large volume of text  to extract (some giga of
>text), I now use PDFTOTEXT which comes with XPDF. I slighly modify for my
>needs. If you are interested, I will look for the script in my archives
Please do, and post a link. Thanks.
+-------------------------------------------------

And you could have left out:
>------_=_NextPart_001_01C166CF.B8C164F0
>Content-Type: text/plain;
[ ... about 62 lines ]
>------_=_NextPart_001_01C166CF.B8C164F0
>Content-Type: text/html;
[ ... about 95 lines ]
>------_=_NextPart_001_01C166CF.B8C164F0--
>

BTW, I find HTML particularly annoying in newsgroup posts, in case you can't tell ;-)




More information about the Python-list mailing list