email, unicode, HTML, and removal thereof
Gerhard Häring
gerhard.haering at opus-gmbh.net
Thu Oct 31 04:20:48 EST 2002
Andrew Dalke <adalke at mindspring.com> [2002-10-31 09:43 GMT]:
> Short version:
> What should I do to strip out markup from an email'ed HTML
> document so I can get just the text? (Yeah, it won't always
> get only the text.) I'm having problems in how to handle
> the charset.
>
> Solutions in pure Python or via calling a common (under unix)
> external program are fine.
For a Unix solution:
lynx -dump
w3m -dump
links -dump
... if any of these text-mode browsers is installed.
-- Gerhard
More information about the Python-list
mailing list