Excess whitespace in my soup

John Machin sjmachin at lexicon.net
Sat Jan 19 07:20:08 EST 2008


On Jan 19, 11:00 pm, Fredrik Lundh <fred... at pythonware.com> wrote:
> John Machin wrote:
> > I'm happy enough with reassembling the second item. The problem is in
> > reliably and  correctly collapsing the whitespace in each of the above
>
>  > fiveelements. The standard Python idiom of u' '.join(text.split())
>  > won't work because the text is Unicode and u'\xa0' is whitespace
>
> > and would be converted to a space.
>
> would this (or some variation of it) work?
>
>  >>> re.sub("[ \n\r\t]+", " ", u"foo\n  frab\xa0farn")
> u'foo frab\xa0farn'
>
> </F>

Yes, partially. Leading and trailing whitespace has to be removed
entirely, not replaced by one space.

Cheers,
John



More information about the Python-list mailing list