email, unicode, HTML, and removal thereof

Gerhard Häring gerhard.haering at opus-gmbh.net
Thu Oct 31 04:20:48 EST 2002


Andrew Dalke <adalke at mindspring.com> [2002-10-31 09:43 GMT]:
> Short version:
>    What should I do to strip out markup from an email'ed HTML
>    document so I can get just the text?  (Yeah, it won't always
>    get only the text.)  I'm having problems in how to handle
>    the charset.
> 
> Solutions in pure Python or via calling a common (under unix)
> external program are fine.

For a Unix solution:

lynx -dump
w3m -dump
links -dump

... if any of these text-mode browsers is installed.

-- Gerhard



More information about the Python-list mailing list