text representation of HTML

garabik-news-2005-05 at kassiopeia.juls.savba.sk garabik-news-2005-05 at kassiopeia.juls.savba.sk
Thu Jul 20 10:37:38 EDT 2006


Ksenia Marasanova <ksenia.marasanova at gmail.com> wrote:
> Hi,
> 
> I am looking for a library that will give me very simple text
> representation of HTML.
> For example
> <div><h1>Title</h1><p>This is a <br />test</p></div>
> 
> will be transformed to:
> 
> Title
> 
> This is a
> test
> 
> 
> i want to send plain text alternative of html email, and would prefer
> to do it automatically from HTML source.

something like this:

import re
text = '<div><h1>Title</h1><p>This is a <br />test</p></div>'
text = re.sub(r'[\n\ \t]+', ' ', text)
text = re.sub(r'(?i)(\<p\>|\<br\>|\<h[1-6]\>)', '\n', text)
result = re.sub('<.+?>', '', text)
print result

-- 
 -----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



More information about the Python-list mailing list