about Python doc reader

norseman norseman at hughes.net
Wed May 13 16:55:51 EDT 2009


Kushal Kumaran wrote:
> On Wed, May 13, 2009 at 4:28 PM, Shailja Gulati <shailja.gulati at tcs.com> wrote:
>> Hi ,
>>
>> I am currently working on "Information retrieval from semi structured
>> Documents" in which there is a need to read data from Resumes.
>>
>> Could anyone tell me is there any python API to read Word doc?
>>
> 
> If you're using Windows, you can use COM APIs to read Word documents.
> Or you can use OpenOffice.org using uno.  You can find examples of
> either by googling.
> 
============================
One problem that I keep getting with OOo an UNO and python. When asked 
to output a .txt file it comes out sorta pk-zipped. Same for .csv files 
it outputs.  If you can, I suggest you work with Microsoft's COM. I have 
had better luck there.  Not much, but better.  Usually get a real .txt

For what it is worth, in OOo I did have some progress by creating a 
macro to write out text in it and setting it to run on EVERY file it 
opens and ten close OOo after the write. Then batched the  OOo file.doc 
process with a:

files2process.sh            #files2process.bat  in window$
================
swriter file1.doc
swriter file2.doc
.
.

not very elegant, but it worked for me.


To be honest - I just give those to a clerk and let them point and click 
until done these days.  Less frustrating.  Documentation bad for each.


Steve



More information about the Python-list mailing list