Re: whitespace within a string

From: Jeff Epler (jepler_at_unpythonic.net)
Date: 02/24/04


Date: Mon, 23 Feb 2004 19:44:13 -0600
To: Bart Nessux <bart_nessux@hotmail.com>

You can use the magic of no-arg split() to do this:
    def canonize_whitespace(s):
        return " ".join(s.split())

>>> canonize_whitespace("a b\t\tc\td\t e")
    'a b c d e'

A regular expression substituion can do the job too
    def canonize_whitespace(s):
        return re.sub('\s+', ' ', s)

>>> canonize_whitespace("a b\t\tc\td\t e")
    'a b c d e'
 
Of course, if 'x=y' is accepted just like 'x = y' and 'x = y', then
neither of these approaches is good enough.

    def canonize_config_line(s):
        if not '=' in s: return s
        a, b = s.split("=", 1)
        return "%s = %s" % (a.strip(), b.strip())
>>> [canonize_config_line(s) for s in
    ... ['x=y', 'x\t= y', ' x = y ', "#z"]]
    ['x = y', 'x = y', 'x = y', '#z']

Jeff