Simple regex with whitespaces

John Machin sjmachin at lexicon.net
Mon Sep 11 00:58:22 EDT 2006


mathieu.malaterre at gmail.com wrote:
> Hello,
>
>   I cannot figure out a way to find a regular expression that would
> match one and only one of these two strings:
>
> s1 = '                              how are you'
> s2 = '    hello world               how are you'
>
>   All I could come up with was:
> patt = re.compile('^[ ]*([A-Za-z]+)[  ]+([A-Za-z]+)$')
>
>   Which of course does not work. I cannot express the fact: sentence
> have 0 or 1 whitespace, separation of group have two or more
> whitespaces.
>
> Any suggestion ? Thanks a bunch !
> Mathieu

1. A "word" is one or more non-whitespace charaters -- subpattern is
\S+
2. A "sentence" is one or more words separated by a single white space
IOW a word followed by zero or more occurrences of  whitespace+word --
so a sentence will be matched by \S+(\s\S+)*
3. Leading and trailing runs of whitespace should be ignored -- use \s*
4. You will need to detect the case of 0 sentences (all whitespace)
separately -- I trust you don't need to be told how to do that :-)
5. Don't try to match two or more sentences; match one sentence, and
anything that fails must 0 or 2+ sentences.

So :

|>>> s1 = '                              how are you'
|>>> s2 = '    hello world               how are you'
|>>> pat = r"^\s*\S+(\s\S+)*\s*$"
|>>> import re
|>>> re.match(pat, s1)
|<_sre.SRE_Match object at 0x00AED9E0>
|>>> re.match(pat, s2)
|>>> re.match(pat, '      ')
|>>> re.match(pat, '     a     b    ')
|>>> re.match(pat, '     a b    ')
|<_sre.SRE_Match object at 0x00AED8E0>
|>>> re.match(pat, '     ab    ')
|<_sre.SRE_Match object at 0x00AED920>
|>>> re.match(pat, '     a    ')
|<_sre.SRE_Match object at 0x00AED9E0>
|>>> re.match(pat, 'a')
|<_sre.SRE_Match object at 0x00AED8E0>
|>>>

HTH,
John




More information about the Python-list mailing list