Excess whitespace in my soup
John Machin
sjmachin at lexicon.net
Sun Jan 20 05:47:53 EST 2008
Remco Gerlich wrote:
> Not sure if this is sufficient for what you need, but how about
>
> import re
> re.sub(u'[\s\xa0]+', ' ', s)
>
> That should replace all occurances of 1 or more whitespace or \xa0
> characters, by a single space.
>
It does indeed, and so does
re.sub(u'\s\+', ' ', s)
because u'\xa0' *IS* whitespace in the Python unicode world, but it's
not whitespace in the HTML sense and it must be preserved.
Cheers,
John
More information about the Python-list
mailing list