Newbie here... getting a count of repeated instances in a list.

Amy G amy-g-art at cox.net
Fri Nov 21 19:38:43 EST 2003


I started trying to learn python today.  The program I am trying to write
will open a text file containing email addresses and store them in a list.
Then it will go through them saving only the domain portion of the email.
After that it will count the number of times the domain occurs, and if above
a certain threshhold, it will add that domain to a list or text file, or
whatever.  For now I just have it printing to the screen.

This is my code, and it works and does what I want.  But I want to do
something with hash object to make this go a whole lot faster.  Any
suggestions are appreciated a great deal.

Thanks,
Amy

ps.  Sorry about the long post.  Just really need some help here.


CODE
************************
file = open(sys.argv[1], 'r')             # Opens up file containing emails
mail_list = file.readlines()                # and sets the contents into a
list

def get_domains(email_list):            # This function takes list of emails
and returns the domains only
            domain_list = email_list
            line_count = 0
            while line_count < len(email_list):
                        domain_list[line_count] =
email_list[line_count].split('@', 1)[1]
                        domain_list[line_count] =
email_list[line_count].strip()
            return domain_list

def count_domains(domain_list):    # Takes argument of a list of domains and
returns a list of domains that
            counted_domains = 0        # occur more than <threshhold> number
of times
            line_count = 0
            domain_count = 0
            threshhold = 10
            while line_count < len(domain_list):
                        domain_count =
domain_list.count(domain_list[line_count])
                        if domain_count > threshhold:
                                    r = 0
                                    counted_domains.append(d)
                                    while r < (domain_count -1):
# Remove all other instances of an email once counted
                                                    domain_list.remove(d)
                                                    r = r + 1
                        line_count = line_count + 1
            return counted_domains


domains = get_domains(mail_list)
counted = count_domains(domains)
print counted

********************************************






More information about the Python-list mailing list