Html or Pdf to Rtf (Linux) with Python

Mike Meyer mwm at mired.org
Thu Dec 16 13:01:37 EST 2004


Axel Straschil <axel at straschil.com> writes:

> Hallo!
>
>> However, our company's product, PDFTextStream does do a phenomenal
>> job of extracting text and metadata out of PDF documents.  It's
>> crazy-fast, has a clean API, and in general gets the job done very
>> nicely.  It presents two points of compromise from your idea
>> situation:
>> 1. It only produces text, so you would have to take the text it
>> provides and write it out as an RTF yourself (there are tons of
>> packages and tools that do this).  Since the RTF format has pretty
>> weak formatting capabilities compared
>
> I've got the Input Source in HTML, the Problem ist converting from any
> to RTF. Please give me a hint where the tons of packages are.

That's easy. Load the HTML in MS Word, and save it as RTF. Script it
via COM using the python win32all (I think that's what it's now
called) package.

        <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.



More information about the Python-list mailing list