[Tutor] regular expressions query

Mats Wichmann mats at wichmann.us
Fri May 24 11:27:20 EDT 2019


On 5/23/19 6:15 PM, mhysnm1964 at gmail.com wrote:
> All,
> 
>  
> 
> Below I am just providing the example of what I want to achieve, not the
> original strings that I will be using the regular expression against.
> 
>  
> 
> The original strings could have:
> 
>  
> 
> "Hello world"
> 
> "hello  World everyone"
> 
> "hello everyone"
> 
> "hello   world and friends"
> 
>  
> 
> I have a string which is "hello world" which I want to identify by using
> regular expression how many times:
> 
> *	"hello" occurs on its own.
> *	"Hello world" occurs in the list of strings regardless of the number
> of white spaces.

I don't know if you've moved on from this problem, but here's one way
one might tackle finding the hello world's in this relatively simple
scenario:

1. join all the strings into a single string, on the assumption that you
care about substrings that span a line break.
2. use the findall method to hit all instances
3. specify the ingore case flag to the re method
4. specify one-or-more bits of whitespace between words of the substring
in your regular expression pattern.

most of that is assumption since as Alan said, you didn't describe the
problem precisely enough for a programmer, even if it sounds precise
enough in English (e.g. hello occurs on its own - does that mean all
instances of hello, or all instances of hello not followed by world?, etc.)

strings = [ all your stuff ]
hits = re.findall(r'hello\s+world', ' '.join(strings), flags=re.IGNORECASE)

Running this on your sample data shows there are three hits (you can do
len(hits) for that)

===

That's the kind of thing regular expressions are good for, but always
keep in mind that they're not always that simple to wrestle with, which
has led to the infamous quote (credited to Jamie Zawinski, although he
repurposed it from an earlier quote on something different):

Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.


More information about the Tutor mailing list