Fw: PDF library for reading PDF files

Robin Becker robin at jessikat.fsnet.co.uk
Mon Jan 19 08:17:12 EST 2004


In article <100nlf2b1qjdae2 at corp.supernews.com>, Cameron Laird
<claird at lairds.com> writes
.....
>>No, but ReportLab (the company) has a product separate from reportlab 
>>(the package) called PageCatcher that does exactly what the OP asked 
>>for. It is not open source, however, and costs a chunk of change.
>
>Let's take this one step farther.  Two posts now have
>quite clearly recommended ReportLab's PageCatcher <URL:
>http://reportlab.com/docs/pagecatcher-ds.pdf >.  I
>completely understand and agree that ReportLab supports
>a mix of open-source, no-fee, and for-fee products, and
>that PageCatcher carries a significant license fee.  I
>entirely agree that PageCatcher "read[s] PDF files ...
>and ... extract[s] information from the PDF with it."
>
>HOWEVER, I suspect that what the original questioner
>meant by his words was some sort of PDF-to-text "extrac-
>tion" (true?) and, unless PageCatcher has changed a lot
>since I got my last copy, PDF-to-text is NOT one of its
>functions.  
I suspect Cameron is right. ReportLab does have a product called
pageCatcher, but its main function is to grab individual pages for
reuse. I believe it could be extended to go deeper and mess about with
text streams, but it certainly doesn't do that now and would take some
effort to do properly as text can be complicated in PDF (or postscript).
-- 
Robin Becker



More information about the Python-list mailing list