Using python to convert PDF document to MSWord documents
Ksenia Marasanova
ksenia at ksenia.nl
Tue Sep 28 17:30:23 EDT 2004
>> From: JEET <hjeet_in at yahoo.com>
>> Can anyone please suggest me if there any python modules available to
>> convert PDF document to MSWord documents. If not then can you please
>> suggest how can i acheive this.
>
> No python modules, but:
> - feeding the subject line to google brings some sponsored links that
> claim to solve your problem
> - http://www.quiss.org/swftools/ has a tool to convert PDF to Flash,
> so there must be some code to detect Text, Fonts etc.
>
Pdf2swf is based on xpdf (http://www.foolabs.com/xpdf).
Another tool, that is also based on xpdf, is pdftohtml
(http://pdftohtml.sourceforge.net/). It can convert pdf to html (using
absolute CSS positioning) or to xml. I don't know if there is any rtf
or Word writers in Python, but in the previous VB life I programmed a
simple Word macro that would open HTML page and save it as .doc
document. It was the most easy way to get all images embedded and
formatting correctly done. Don't know, however, how it will handle
absolute positioning.
Another possible option is to convert PDF to PS format, and than use
pstoedit (http://www.pstoedit.net/pstoedit) with shareware RTF plugin
mentioned on that page. Don't have any experience with this option.
Ksenia.
More information about the Python-list
mailing list