Python word to text

BJörn Lindqvist bjourne at gmail.com
Tue Sep 1 13:42:13 EDT 2009


2009/9/1 Tino Wildenhain <tino at wildenhain.de>:
> Am 01.09.2009 13:42, schrieb Nitebirdz:
>>
>> On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJörn Lindqvist wrote:
>>>
>>> Hello everybody,
>>>
>>> I'm looking for a pure Python solution for converting word documents
>>> to text. App Engine doesn't allow external programs, which means that
>>> external programs like catdoc and antiword can't be used. Anyone know
>>> of any?
>>>
>>
>> A quick search returned this:
>>
>> http://code.activestate.com/recipes/279003/
>>
>>
>> Did you give it a try?
>
> Thats a funny advice. Did you read that receipe? ;-)
> "Requires the Python for Windows extensions, and MS Word."
> how does this match with "App Engine doesn't allow external programs"? :-)
>
> For excel this would be easy but word - Björn, did you check google api
> if you would be able to access google docs for this?

I did not, thanks for the tip! The system I managed to hack together
uploads the .doc to a google docs account and then retrieves it again
as plain text. It works but sure feels kind of silly. It's not very
reliable because if google has some kind of problem with their docs
application it doesn't work at all. Plus the method is dirt slow due
to the latency of all the http calls. But better than nothing.


-- 
mvh Björn



More information about the Python-list mailing list