check if file is MS Word or PDF file

Michael Crute mcrute at gmail.com
Sat Sep 27 20:00:59 EDT 2008


On Sat, Sep 27, 2008 at 7:01 PM, Chris Rebert <clp at rebertia.com> wrote:
> Looking at the docs for the mimetypes module, it just guesses based on
> the filename (and extension), not the actual contents of the file, so
> it doesn't really help the OP, who wants to make sure their program
> isn't misled by an inaccurate extension.

One other way to detect a pdf is to just read the first 4 bytes from
the file. Valid pdf files start with "%PDF-". Something similar can be
done with Word docs but I don't know what the magic bytes are. This
approach is pretty similar to what the file command does but is
probably a better approach if you have to support multiple platforms.

-mike

-- 
________________________________
Michael E. Crute
http://mike.crute.org

God put me on this earth to accomplish a certain number of things.
Right now I am so far behind that I will never die. --Bill Watterson



More information about the Python-list mailing list