PDF library?

Wed Apr 21 04:28:51 EDT 2004

Aloha,

Paul Rubin schrieb:
> Simon Burton <simonb at NOTTHISBIT.webone.com.au> writes:
> > http://www.reportlab.org/
> > handles pdf files.
> Reportlab generates reports in pdf format, but I want to do the
> opposite, namely read in pdf files that have already been generated by
> a different program, and crunch on them.  Any more ideas?  Thanks.

The commercial version (reportlab.com) mentions a tool named
PageCatcher, that seems to be able to extract pages and page descriptions
out of .pdf documents. There is not that many information on the web-page.

If you read comp.text.tex you will find various solutions for composing
and a few for extracting data/content from .pdf documents. Afaik there
is at the moment (read as: i'm working on it) no free-self-contained-
python solution. But as python is very interface-friendly you can use
general tools like gs easily. 

For your problem i would suggest to use gs als a .pdf to .ps filter
in the first place, work on the .ps and distill back with gs.

Wishing a happy day
		LOBI