Parsing MS Word Document?

MrBill nospam at nospam.com
Wed Oct 22 12:17:04 EDT 2003


Thanks John,
This should get me started.
Bill
"John J. Lee" <jjl at pobox.com> wrote in message
news:87znfttoyn.fsf at pobox.com...
> "MrBill" <nospam at nospam.com> writes:
>
> > I would like to be able to open, read, and extract data from a report
that
> > is produced in MS Word.  The doc seems to contain embedded spreadsheets.
I
> > would like to extract some of the data from the spreadsheets and feed it
> > into another application.  I've been reading a little bit about OLE and
MS
> > Word and sure would like to find a module that hides some of this
so-called
> > innovation from me.
>
> :-)  Yeah, isn't all that baroque complexity wonderful?
>
> 1. Alex Martelli's suggestion on this list: use RTF.  Word can import
>    and export to it.  You can automate that from VB or Python in the
>    usual COM ways (see 3.).  I don't know whether you'll get useful
>    RTF out of embedded Excel sheets, though.
>
> 2. Use OpenOffice via PyUNO.
>
> 3. As you already know, use the MS Office object models, with Python
>    for Windows extensions (or ctypes, if you're brave).  Perhaps ADO
>    is what you're looking for?  IIRC, ADO isn't too complicated and
>    can treat Excel sheets as data sources just as it does for
>    relational databases.
>
> For simpler Word docs (no embedded stuff), there are other tools out
> there, but they'd be no use in this case.
>
> A useful tip for 3. is to record a VB macro in Word, then edit it to
> something sane.  You can keep it in VB, or do the relatively trivial
> edits required to convert it to Python.  Here's an example on
> automating RTF generation:
>
>
http://www.google.com/groups?q=author:jjl%40pobox.com+RTF+Word&hl=en&lr=&ie=UTF-8&selm=87isqnnxvy.fsf%40pobox.com&rnum=1
>
>
> John






More information about the Python-list mailing list