[Tutor] finding common words using set

ThreeBlindQuarks threesomequarks at proton.me
Sun Jan 1 23:52:31 EST 2023


I also see the code as working, albeit to test it better, change the strings to contain two common elements like so.

The code did not really use sets as intended in that it gave intersection() a list of words rather than a set but that seems like something the function is designed to handle. It does have a bit of a flaw in assuming there are at least TWO strings in the initial list. Here is a version to try that works fine if you give it an empty list, just one item in a list, or any number of items.

It is in a sense shorter and more general and possibly slower and will be explained below using a new function name of find_common_words_all:

def find_common_words_all(strings):
    """ given a list of strings containing with 0 or more strings,
        returns a single string containing the words contained in every
        string as a space separated single string. """
    
    # convert the list of strings to a list of sets containing 
    # the unique words in each.
    word_lists = [set(s.split()) for s in strings]

    # Get the union of all the words and intersect that with ALL
    # the sets in the list including the first. The result is converted
    # to a single string containing the space separated words 
    # in some order. Might be useful to sort them.
    return(' '.join(set().union(*word_lists).intersection(*word_lists)))

The first non-comment line is similar to what was shown except it makes them all sets.

The second line is a bit convoluted as it first constructs the union of all the sets by taking the union of an empty set with all the others. 

Then it invokes the intersection method on this maximal set using all the smaller sets and that is fed as an argument to a method for a string with a single space in it to ask the results to be joined into one string with that spacing between the items. All that is provided to be returned.

I find it useful to return things like that sorted or in a predictable order. So here is a second version that adds a few steps to make the set into a list and sort it before the parts are recombined:

def find_common_words_all_sorted(strings):
    """ given a list of strings containing with 0 or more strings,
        returns a single string containing the words contained in every
        string as a space separated single string. """
    
    # convert the list of strings to a list of sets containing 
    # the unique words in each.
    word_lists = [set(s.split()) for s in strings]

    # Get the union of all the words and intersect that with ALL
    # the sets in the list including the first. The result is converted
    # to a single string containing the space separated words 
    # in some order. Results are sorted.
    return(' '.join(sorted(list(set().
                                union(*word_lists).
                                intersection(*word_lists)))))

Here is the result for various inputs starting with an empty list which returns an empty string, a list with one string that returns the items sorted, one with two strings containing the same items in different orders and so on to three or four including some with no intersection. It seems robust. And, yes, it can be written in many smaller steps. As far as I know, this does not break any of the requirements but I can be corrected.

find_common_words_all_sorted([])
       
''

find_common_words_all_sorted(['a n d'])
       
'a d n'

find_common_words_all_sorted(['a n d', 'd n a'])
       
'a d n'

find_common_words_all_sorted(['a n d', 'r n a'])
       
'a n'

find_common_words_all_sorted(['a n d', 'r n a', 'a b c'])
       
'a'

find_common_words_all_sorted(['a n d', 'r n a', 'a b c', ' x y z'])
                              
''

- QQQ






Sent with Proton Mail secure email.

------- Original Message -------
On Sunday, January 1st, 2023 at 3:51 PM, seraph <seraph776 at gmail.com> wrote:


> When I ran your code, it returned *banana. *You mentioned it should
> return "banana
> and orange." However, based on the example that you provided, it should
> only return banana.
> 
> strings = ['apple orange banana', 'orange banana grape', 'banana mango']
> 
> 
> The common word in these 3 groups is *banana *- not banana & orange.
> 
> I hope this helps!
> 
> Seraph
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor


More information about the Tutor mailing list