whitespace within a string

Jeff Epler jepler at unpythonic.net
Mon Feb 23 20:44:13 EST 2004


You can use the magic of no-arg split() to do this:
    def canonize_whitespace(s):
        return " ".join(s.split())

    >>> canonize_whitespace("a  b\t\tc\td\t e")
    'a b c d e'

A regular expression substituion can do the job too
    def canonize_whitespace(s):
        return re.sub('\s+', ' ', s)

    >>> canonize_whitespace("a  b\t\tc\td\t e")
    'a b c d e'
 
Of course, if 'x=y' is accepted just like 'x = y' and 'x   =    y', then
neither of these approaches is good enough.

    def canonize_config_line(s):
        if not '=' in s: return s
        a, b = s.split("=", 1)
        return "%s = %s" % (a.strip(), b.strip())
    >>> [canonize_config_line(s) for s in
    ...        ['x=y', 'x\t=  y', '  x =    y ', "#z"]] 
    ['x = y', 'x = y', 'x = y', '#z']

Jeff




More information about the Python-list mailing list