Excess whitespace in my soup

John Machin sjmachin at lexicon.net
Sun Jan 20 05:47:53 EST 2008


Remco Gerlich wrote:
> Not sure if this is sufficient for what you need, but how about
>
> import re
> re.sub(u'[\s\xa0]+', ' ', s)
>
> That should replace all occurances of 1 or more whitespace or \xa0 
> characters, by a single space.
>
It does indeed, and so does
    re.sub(u'\s\+', ' ', s)
because u'\xa0' *IS* whitespace in the Python unicode world, but it's 
not whitespace in the HTML sense and it must be preserved.

Cheers,
John



More information about the Python-list mailing list