Newbie regular expression and whitespace question

George Sakkis gsakkis at rutgers.edu
Thu Sep 22 20:13:47 EDT 2005


"googleboy" <mynews44 at yahoo.com> wrote:
> Hi.
>
> I am trying to collapse an html table into a single line.  Basically,
> anytime I see ">" & "<" with nothing but whitespace between them,  I'd
> like to remove all the whitespace, including newlines. I've read the
> how-to and I have tried a bunch of things,  but nothing seems to work
> for me:
>
> [snip]

As others have shown you already, you need to use the sub method of the re module:

import re
regex = re.compile(r'>\s*<')
print regex.sub('><',data)

> For extra kudos (and I confess I have been so stuck on the above
> problem I haven't put much thought into how to do this one) I'd like to
> be able to measure the number of characters between the <p> & </p>
> tags, and then insert a newline character at the end of the next word
> after an arbitrary number of characters.....   I am reading in to a
> script a bunch of paragraphs formatted for a webpage, but they're all
> on one big long line and I would like to split them for readability.

What I guess you want to do is wrap some text. Do not reinvent the wheel, there's already a module
for that:

import textwrap
print textwrap.fill(oneBigLongLine, 60)

HTH,
George





More information about the Python-list mailing list