PDF library for reading PDF files

Josiah Carlson jcarlson at uci.edu
Tue Jan 20 03:13:52 EST 2004


> Thanks. I am studying the PDF spec, it just does not seem to be that easy
> having to implement all the decompressions, etc. The "information" I am
> trying to extract from the PDF file is the text, specifically in a way to
> keep the original paragraphs of the text. I have seen so far one shareware
> standalone tool that extracts the text (and a lot of other formatting
> garbage) into an RTF document keeping the paragraphs as well. I would need
> only the text.
> 
> Any suggestions?

Peter,

Suggestion: extract the document to RTF using that other tool, then use
any one of the few dozen RTF parsers to convert them into plaintext.

 - Josiah



More information about the Python-list mailing list