[Pythonmac-SIG] PDF reading

Jeremy Reichman jaharmi at jaharmi.com
Sun Jan 25 03:43:58 CET 2009


On Wed, 21 Jan 2009 11:35:38 +0000, "Paul Brown" <appworld at mac.com>
said:
> anyone have any  pointers on reading a pdf file.
> 
> i need to extract the text content , page number ,  text style ,  
> block , ... all in XML if poss

I'm not sure if you need to get a specific page number or the count of
pages for the document. If you need the total page count (for whatever
reason) from a PDF document, you can use the Quartz 2D bindings as so:

<http://bit.ly/jV7P>

I don't know if Quartz 2D will give you the rest of what you want, but
perhaps PyObjC will. (I've got less experience with PyObjC than the
Quartz 2D bindings, and not much experience with the Quartz 2D bindings.
<grin>)

Of course, either of these ways would not be pure Python and would
require at least Mac OS X 10.3 (Quartz 2D) or 10.5 (for bundled PyObjC).

-- 
Jeremy
jaharmi at jaharmi.com



More information about the Pythonmac-SIG mailing list