Fw: PDF library for reading PDF files

Robert Kern rkern at ucsd.edu
Mon Jan 19 17:10:24 EST 2004


Cameron Laird wrote:
> In article <oxEOb.96911$Vs3.36407 at twister.socal.rr.com>,
> Robert Kern  <rkern at ucsd.edu> wrote:
> 
>>Cameron Laird wrote:
>>
>>>In article <Xns9474CBDE9B2D7cpl19ghumspamgourmet at 62.153.159.134>,
>>>Harald Massa  <cpl.19.ghum at spamgourmet.com> wrote:
>>>
>>>
>>>>>I am looking for a library in Python that would read PDF files and I
>>>>>could extract information from the PDF with it. I have searched with
>>>>>google, but only found libraries that can be used to write PDF files. 
>>>>
>>>>reportlab has a lib called pagecatcher; it is fully supported with python, 
>>>>it is not free.
>>>>
>>>>Harald
>>>
>>>
>>>ReportLab's libraries are great things--but they do not "extract
>>>information from the PDF" in the sense I believe the original
>>>questioner intended.  
>>
>>No, but ReportLab (the company) has a product separate from reportlab 
>>(the package) called PageCatcher that does exactly what the OP asked 
>>for. It is not open source, however, and costs a chunk of change.
> 
> 
> Let's take this one step farther.  Two posts now have
> quite clearly recommended ReportLab's PageCatcher <URL:
> http://reportlab.com/docs/pagecatcher-ds.pdf >.  I
> completely understand and agree that ReportLab supports
> a mix of open-source, no-fee, and for-fee products, and
> that PageCatcher carries a significant license fee.  I
> entirely agree that PageCatcher "read[s] PDF files ...
> and ... extract[s] information from the PDF with it."
> 
> HOWEVER, I suspect that what the original questioner
> meant by his words was some sort of PDF-to-text "extrac-
> tion" (true?) and, unless PageCatcher has changed a lot
> since I got my last copy, PDF-to-text is NOT one of its
> functions.  

Rereading http://www.reportlab.com/PageCatchIntro.html , you're right. 
My apologies. I thought you were talking about the open source reportlab 
package and not PageCatcher specifically.



More information about the Python-list mailing list