[python-win32] Re: NTFS/MS Word file metadata - no PIDSI for Category?

Earle_Williams at ak.blm.gov Earle_Williams at ak.blm.gov
Fri Jul 14 23:55:34 CEST 2006


Posting this for posterity: a snippet to read and return the file
properties summary information from a modern Windows file system.  It works
for me on Win XP Pro SP2 over NTFS and Active Directory.  Thanks to Roger
Upole and Mark Hammond for pointing the way, and my apologies for any
python newb hacks

<pre>
from win32com import storagecon
import pythoncom, os, sys


def get_stats(fname):
    author = title = subject = keywords = comments = category = None
    try:
        pssread=pythoncom.StgOpenStorageEx(fname,
storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE,
storagecon.STGFMT_FILE, 0 , pythoncom.IID_IPropertySetStorage)
    except:
        stg = pythoncom.StgOpenStorage(fname, None,
storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE )
        try:
            pssread = stg.QueryInterface(pythoncom.IID_IPropertySetStorage)
        except:
            print "No extended storage"
        else:
            try: ps =
pssread.Open(pythoncom.FMTID_SummaryInformation,storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE)
            except:
                pass
            else:
                author,title,subject,keywords,comments = ps.ReadMultiple(
(storagecon.PIDSI_AUTHOR, storagecon.PIDSI_TITLE, storagecon.PIDSI_SUBJECT,
storagecon.PIDSI_KEYWORDS, storagecon.PIDSI_COMMENTS) )
            try: ps =
pssread.Open(pythoncom.FMTID_DocSummaryInformation,storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE)
            except:
                pass
            else:
                category = ps.ReadMultiple( (storagecon.PIDDSI_CATEGORY,) )
[0]
        return author,title,subject,keywords,comments,category
    else:
        try: ps =
pssread.Open(pythoncom.FMTID_SummaryInformation,storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE)
        except:
            pass
        else:
            author,title,subject,keywords,comments = ps.ReadMultiple(
(storagecon.PIDSI_AUTHOR, storagecon.PIDSI_TITLE, storagecon.PIDSI_SUBJECT,
storagecon.PIDSI_KEYWORDS, storagecon.PIDSI_COMMENTS) )
        try: ps =
pssread.Open(pythoncom.FMTID_DocSummaryInformation,storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE)
        except:
            pass
        else:
            category = ps.ReadMultiple( (storagecon.PIDDSI_CATEGORY,) ) [0]
        try: ps =
pssread.Open(pythoncom.FMTID_UserDefinedProperties,storagecon.STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE)
        except:
            pass
        else:
            pass
        return author,title,subject,keywords,comments,category



if __name__=='__main__':
    args = sys.argv
    try: args[1]
    except:
        print "Usage: getstats filename"
    else:
        filename = args[1]
        print filename
        author,title,subject,keywords,comments,category = get_stats(
filename )
        print "  Author: %s" % author
        print "  Title: %s" % title
        print "  Subject: %s" % subject
        print "  Keywords: %s" % keywords
        print "  Comments: %s" % comments
        print "  Category: %s" % category
</pre>



python-win32-bounces at python.org wrote on 07/13/2006 06:24:38 PM:

> Earle Williams wrote:
> > Hola,
> >
> > I'm trying to pull extended file properties from NTFS or MSWord files.
> > List archives point to snippets from Mark Hammond and Roger Upole, and
I
> > can get to most of the metadata.  However I'm having trouble getting to
the
> > 'Category' information.  It seems in the NTFS metadata that item is
flagged
> > with a PIDSI_TITLE constant, at least that's what I get with my code
> > (hacked from testStorage.py).  If there is no 'Title' info and just
> > Category info, the category info gets read as title.,
> >
> > And in MSWord metadata I can't pull that info at all using Mark
Hammond's
> > DumpStorage snippet.  I get everything else but not the 'Category'
data.
> >
> > Anyone have advice on a method to definitively retrieve the category
info?
> >
>
> Category is part of DocSummaryInformation, so you'll need the PIDDSI*
> constants instead of PIDSI*.  (PIDDSI_CATEGORY just happens to be
> equal to PIDSI_TITLE)
>
> from win32com import storagecon
> import pythoncom
> fname='c:\\tmp.doc'
>
> pss=pythoncom.StgOpenStorageEx(fname, storagecon.
> STGM_READ|storagecon.STGM_SHARE_EXCLUSIVE,
>     storagecon.STGFMT_DOCFILE, 0 , pythoncom.IID_IPropertySetStorage)
> ps=pss.Open(pythoncom.FMTID_DocSummaryInformation)
> print ps.ReadMultiple((storagecon.PIDDSI_CATEGORY,))[0]
>
>      Roger
>
>
>
>
> _______________________________________________
> Python-win32 mailing list
> Python-win32 at python.org
> http://mail.python.org/mailman/listinfo/python-win32



More information about the Python-win32 mailing list