reading PDF using Python [Q]
Dinu C. Gherman
gherman at my-dejanews.com
Wed May 12 08:26:46 EDT 1999
In article <FBKAs7.y4 at cix.compulink.co.uk>,
ncmoon at cix.compulink.co.uk ("Nick Moon") wrote:
> > > I have been playing with parsing pdf files in python. The format
> > > of .pdf is documented on Adobe's web site.
> >
> > Any usefull URL?
I guess this is what you're looking for (including PS):
http://partners.adobe.com/supportservice/devrelations/technotes.html
> Try the adobe site. www.adobe.com but you knew that. The document you
want
> is called 'Portable Document Format Reference Manual - Version 1.2'.
> Though I think Acrobat v4 means there is now a version 1.3. It's in
> surprisingly .pdf format and it's big - about 400 pages when printed.
>
> It is pretty unreadable, but it does describe the file format in mind
> numbingly boring detail. The pdf format itself, looks like the work of
> several different people over several different years. Different bits
of
> the format seem to use rather different styles of data structures.
In fact, I don't think it's unreadybble at all! I've seen much
more boring standards specifications already, like those of W3C.
The PDF specification explains quite nicely the general architec-
ture of a PDF document, the file format, etc. Give it a try!
It even inspired me to start yet-another-rainy-sunday-or-boring-
work-day project to create what might become the world's slowest
but most portable PDF parser... ;-)
Nothing-to-be-released-yet,
Dinu
--== Sent via Deja.com http://www.deja.com/ ==--
---Share what you know. Learn what you don't.---
More information about the Python-list
mailing list