Determining when a file is an Open Office Document

Ross Ridge rridge at csclub.uwaterloo.ca
Fri Jan 19 15:48:14 EST 2007


tubby wrote:
> Now, If only I could something like that on PDF files :)

PDF files should begin with "%PDF-" followed by a version number, eg.
"%PDF-1.4".  The PDF Reference notes that Adobe Acrobat Reader is a bit
more flexiable about what it will accept:

    13. Acrobat viewers require only that the header appear
          somewhere within the first 1024 bytes of the file.
    14. Acrobat viewers also accept a header of the form
          %!PS-Adobe-N.n PDF-M.m

So identifying PDF files is pretty easy.  If you want to examine the
contents of a PDF file you're better off using Postscript, Ghostscript
specifically, since PDF is essentially Postscript with a special
dictionary of commands.

                                        Ross Ridge




More information about the Python-list mailing list