reading PDF using Python [Q]

Dinu C. Gherman gherman at my-dejanews.com
Wed May 12 08:26:46 EDT 1999


In article <FBKAs7.y4 at cix.compulink.co.uk>,
  ncmoon at cix.compulink.co.uk ("Nick Moon") wrote:
> > > I have been playing with parsing pdf files in python. The format
> > > of .pdf is documented on Adobe's web site.
> >
> > Any usefull URL?

I guess this is what you're looking for (including PS):

  http://partners.adobe.com/supportservice/devrelations/technotes.html


> Try the adobe site. www.adobe.com but you knew that. The document you
want
> is called 'Portable Document Format Reference Manual - Version 1.2'.
> Though I think Acrobat v4 means there is now a version 1.3. It's in
> surprisingly .pdf format and it's big - about 400 pages when printed.
>
> It is pretty unreadable, but it does describe the file format in mind
> numbingly boring detail. The pdf format itself, looks like the work of
> several different people over several different years. Different bits
of
> the format seem to use rather different styles of data structures.

In fact, I don't think it's unreadybble at all! I've seen much
more boring standards specifications already, like those of W3C.
The PDF specification explains quite nicely the general architec-
ture of a PDF document, the file format, etc. Give it a try!

It even inspired me to start yet-another-rainy-sunday-or-boring-
work-day project to create what might become the world's slowest
but most portable PDF parser... ;-)

Nothing-to-be-released-yet,

Dinu



--== Sent via Deja.com http://www.deja.com/ ==--
---Share what you know. Learn what you don't.---




More information about the Python-list mailing list