Python File Handling: .xls .doc .pdf ???
Gerhard Häring
gerhard.haering at opus-gmbh.net
Fri Feb 7 03:35:43 EST 2003
Ken Favrow <KenFavrow at attbi.com> wrote:
> I'm trying to make a somewhat simple search engine, but would need to be
> able to read .xls .doc and possibly .pdf for it to be entirely useful. I
> just need to be able to see enough content to find keywords. I've already
> done it with txt and html. How might I accomplish this with the other
> formats??
There are various utilities to convert these formats into plain text:
antiword, catdoc, xlHtml, ...
Some of these converters produce HTML. But HTML can be easily converted to
plain text: $commandline_browser -dump <html-file> where
commandline_browser in ('lynx', 'w3m', 'links').
http://www.spocom.com/users/gjohnson/mutt/#office might be of interest to
you, as it includes links to all of these utilities.
-- Gerhard
More information about the Python-list
mailing list