check if file is MS Word or PDF file
Michael Crute
mcrute at gmail.com
Sat Sep 27 20:00:59 EDT 2008
On Sat, Sep 27, 2008 at 7:01 PM, Chris Rebert <clp at rebertia.com> wrote:
> Looking at the docs for the mimetypes module, it just guesses based on
> the filename (and extension), not the actual contents of the file, so
> it doesn't really help the OP, who wants to make sure their program
> isn't misled by an inaccurate extension.
One other way to detect a pdf is to just read the first 4 bytes from
the file. Valid pdf files start with "%PDF-". Something similar can be
done with Word docs but I don't know what the magic bytes are. This
approach is pretty similar to what the file command does but is
probably a better approach if you have to support multiple platforms.
-mike
--
________________________________
Michael E. Crute
http://mike.crute.org
God put me on this earth to accomplish a certain number of things.
Right now I am so far behind that I will never die. --Bill Watterson
More information about the Python-list
mailing list