Simple regex with whitespaces
John Machin
sjmachin at lexicon.net
Mon Sep 11 00:58:22 EDT 2006
mathieu.malaterre at gmail.com wrote:
> Hello,
>
> I cannot figure out a way to find a regular expression that would
> match one and only one of these two strings:
>
> s1 = ' how are you'
> s2 = ' hello world how are you'
>
> All I could come up with was:
> patt = re.compile('^[ ]*([A-Za-z]+)[ ]+([A-Za-z]+)$')
>
> Which of course does not work. I cannot express the fact: sentence
> have 0 or 1 whitespace, separation of group have two or more
> whitespaces.
>
> Any suggestion ? Thanks a bunch !
> Mathieu
1. A "word" is one or more non-whitespace charaters -- subpattern is
\S+
2. A "sentence" is one or more words separated by a single white space
IOW a word followed by zero or more occurrences of whitespace+word --
so a sentence will be matched by \S+(\s\S+)*
3. Leading and trailing runs of whitespace should be ignored -- use \s*
4. You will need to detect the case of 0 sentences (all whitespace)
separately -- I trust you don't need to be told how to do that :-)
5. Don't try to match two or more sentences; match one sentence, and
anything that fails must 0 or 2+ sentences.
So :
|>>> s1 = ' how are you'
|>>> s2 = ' hello world how are you'
|>>> pat = r"^\s*\S+(\s\S+)*\s*$"
|>>> import re
|>>> re.match(pat, s1)
|<_sre.SRE_Match object at 0x00AED9E0>
|>>> re.match(pat, s2)
|>>> re.match(pat, ' ')
|>>> re.match(pat, ' a b ')
|>>> re.match(pat, ' a b ')
|<_sre.SRE_Match object at 0x00AED8E0>
|>>> re.match(pat, ' ab ')
|<_sre.SRE_Match object at 0x00AED920>
|>>> re.match(pat, ' a ')
|<_sre.SRE_Match object at 0x00AED9E0>
|>>> re.match(pat, 'a')
|<_sre.SRE_Match object at 0x00AED8E0>
|>>>
HTH,
John
More information about the Python-list
mailing list