Code to recognize MS-Word document files?
Mark Hammond
mhammond at skippinet.com.au
Wed Mar 5 01:56:51 EST 2003
Grant Edwards wrote:
> I'm looking for a snippet of python that I can use to determine
> if a file is a MS-Word document. People around here seem to
> have gotten into the habit of attaching MS-Word files without a
> ".doc" on the name.
>
> Even with the .doc, the mimetypes module doesn't seem to get it
> right. :(
This code, from ppw32, will extract information from MS Office
documents. I'm not sure what other properties are available.
Mark.
# DumpStorage.py - Dumps some user defined properties
# of a COM Structured Storage file.
import pythoncom
from win32com import storagecon # constants related to storage functions.
# These come from ObjIdl.h
FMTID_UserDefinedProperties = "{F29F85E0-4FF9-1068-AB91-08002B27B3D9}"
PIDSI_TITLE = 0x00000002
PIDSI_SUBJECT = 0x00000003
PIDSI_AUTHOR = 0x00000004
PIDSI_CREATE_DTM = 0x0000000c
def PrintStats(filename):
if not pythoncom.StgIsStorageFile(filename):
print "The file is not a storage file!"
return
# Open the file.
flags = storagecon.STGM_READ | storagecon.STGM_SHARE_EXCLUSIVE
stg = pythoncom.StgOpenStorage(filename, None, flags )
# Now see if the storage object supports Property Information.
try:
pss = stg.QueryInterface(pythoncom.IID_IPropertySetStorage)
except pythoncom.com_error:
print "No summary information is available"
return
# Open the user defined properties.
ps = pss.Open(FMTID_UserDefinedProperties)
props = PIDSI_TITLE, PIDSI_SUBJECT, PIDSI_AUTHOR, PIDSI_CREATE_DTM
data = ps.ReadMultiple( props )
# Unpack the result into the items.
title, subject, author, created = data
print "Title:", title
print "Subject:", subject
print "Author:", author
print "Created:", created.Format()
if __name__=='__main__':
import sys
if len(sys.argv)<2:
print "Please specify a file name"
else:
PrintStats(sys.argv[1])
More information about the Python-list
mailing list