Newbie regular expression and whitespace question

googleboy mynews44 at yahoo.com
Thu Sep 22 15:58:49 EDT 2005


Hi.

I am trying to collapse an html table into a single line.  Basically,
anytime I see ">" & "<" with nothing but whitespace between them,  I'd
like to remove all the whitespace, including newlines. I've read the
how-to and I have tried a bunch of things,  but nothing seems to work
for me:

--

table = open(r'D:\path\to\tabletest.txt', 'rb')
strTable = table.read()

#Below find the different sort of things I have tried, one at a time:

strTable = strTable.replace(">\s<", "><") #I got this from the module
docs

strTable = strTable.replace(">.<", "><")

strTable = ">\s+<".join(strTable)

strTable = ">\s<".join(strTable)

print strTable

--

The table in question looks like this:

<table width="80%"  border="0">
  <tr>
    <td> </td>
    <td colspan="2">Introduction</td>
    <td><div align="right">3</div></td>
  </tr>
  <tr>
    <td> </td>
  </tr>
  <tr>
    <td><i>ONE</i></td>
    <td colspan="2">Childraising for Parrots</td>
    <td><div align="right">11</div></td>
  </tr>
</table>



For extra kudos (and I confess I have been so stuck on the above
problem I haven't put much thought into how to do this one) I'd like to
be able to measure the number of characters between the <p> & </p>
tags, and then insert a newline character at the end of the next word
after an arbitrary number of characters.....   I am reading in to a
script a bunch of paragraphs formatted for a webpage, but they're all
on one big long line and I would like to split them for readability.

TIA

Googleboy




More information about the Python-list mailing list