HTML tags optimization [ interesting problem]

DENG polytechnique at gmail.com
Wed Aug 31 10:54:37 EDT 2005


hi all,

i use SGMLParser to process HTML files, in order to do some
optimizations,

something like this:

<i><b>TEXT1</b></i><b><i><u>TEXT2</u></i></b>

optimise to

<i><b>TEXT1<u>TEXT2</u></b><i>


at the very beginning, i was thinking of analysing each text-block, to
know their color, size, if is bold or italic, but i found it was too
complicated.

e.g

<font color=red><font size=6>TEXT1</font></font>

optimise to

<font color=red size=6>TEXT1</font>


but if there is TEXT2 exist

<font color=red><font size=6>TEXT1</font>TEXT2</font>

we can not do any optimization.

my problem is I can not find a method to treat all those situation, I
had too much thinking and get fool now


anyone can give me some advices?

thanks




PS:

other examples:

1
<font size=5><font size=7>TEXT</font></font>
=>
<font size=7>TEXT</font>

2
<i>TEXT </i><i>TEXT</i>
=>
<i>TEXT TEXT</i>

3
<i>TEXT<i>TEXT</i></i>
=>
<i>TEXT</i>

etc...




More information about the Python-list mailing list