Fw: PDF library for reading PDF files
Cameron Laird
claird at lairds.com
Mon Jan 19 08:04:34 EST 2004
In article <oxEOb.96911$Vs3.36407 at twister.socal.rr.com>,
Robert Kern <rkern at ucsd.edu> wrote:
>Cameron Laird wrote:
>> In article <Xns9474CBDE9B2D7cpl19ghumspamgourmet at 62.153.159.134>,
>> Harald Massa <cpl.19.ghum at spamgourmet.com> wrote:
>>
>>>>I am looking for a library in Python that would read PDF files and I
>>>>could extract information from the PDF with it. I have searched with
>>>>google, but only found libraries that can be used to write PDF files.
>>>
>>>reportlab has a lib called pagecatcher; it is fully supported with python,
>>>it is not free.
>>>
>>>Harald
>>
>>
>> ReportLab's libraries are great things--but they do not "extract
>> information from the PDF" in the sense I believe the original
>> questioner intended.
>
>No, but ReportLab (the company) has a product separate from reportlab
>(the package) called PageCatcher that does exactly what the OP asked
>for. It is not open source, however, and costs a chunk of change.
Let's take this one step farther. Two posts now have
quite clearly recommended ReportLab's PageCatcher <URL:
http://reportlab.com/docs/pagecatcher-ds.pdf >. I
completely understand and agree that ReportLab supports
a mix of open-source, no-fee, and for-fee products, and
that PageCatcher carries a significant license fee. I
entirely agree that PageCatcher "read[s] PDF files ...
and ... extract[s] information from the PDF with it."
HOWEVER, I suspect that what the original questioner
meant by his words was some sort of PDF-to-text "extrac-
tion" (true?) and, unless PageCatcher has changed a lot
since I got my last copy, PDF-to-text is NOT one of its
functions.
--
Cameron Laird <claird at phaseit.net>
Business: http://www.Phaseit.net
More information about the Python-list
mailing list