comparing all values of a list to regex

Manuel Hendel manuel at hendel.net
Thu Sep 26 07:23:22 EDT 2002


On Wed, Sep 25, 2002 at 10:26:53AM +0000, Alex Martelli wrote:
> E.g.:
> 
> num_matches = [ there.match(item) is not None for item in alist ].count(1)
> 
> if num_matches == 0:
>     againanothernewlist.append(alist)
> elif num_matches == 1:
>     anothetnewlist.append(alist)
> elif num_matches == len(alist):
>     anewlist.append(alist)
> 
> 
> Note that there's some ambiguity on what to do if alist is empty --
> literally this is BOTH "matches all list items" AND "doesn't match
> at all".  Here I've chosen the second interpretation, but depending
> on your specs you can choose to test num_matches in different ways.

To be honest I don't understand that completely. Maybe, I have to
describe my problem a little bit more in detail and show the things
I've already done.

This is the input, it is "|" seperated textfile with about 15000 lines
of pop3 accounts:

|Number|String|Number|String(Domain)|String(Account)|String(Login/Email)|String(Password)|String|String|

These are the fields I care about.

String(Domain): This is the domain of the Account

String(Account): This is the part in front of the @doamin. This can
also be a * for a catchall account.

String(Login/Email): This is the local pop3-account or a emailaddress,
or both, comma seperated.

String(Password): This is the password if String(Login/Email) is a
pop3-account.

This should be the output:

Three text files. One with only forwardings (only emailaddresses in
the String(Login/Email). One with only pop3-accounts in the
String(Login/Email), and one mixed, where String(Login/Email) has
pop3-accounts and emailaddresses.


This is what I've done already:

#!/usr/bin/env python
#
import sys, string, re

inputfile = open(sys.argv[-1], "r")

lines = inputfile.readlines()

# deleting the first three lines and the last line
lines = lines[3:-1]

# empty list for the domains
domains = []

# empty list for the new lines
neededlines = []

for line in lines:
    fields = string.split(line, "|")

    # check if the domain field contains a regualr domain
    p = re.compile("[0-9a-z\\.]+[a-z]{2,3}",
    re.IGNORECASE)
    m = p.match(string.strip(fields[4]))
    if m:
        # extract the needed fields in a  separate list and
        # split the gLogin to get  a list in the list
        neededfields = []
        neededfields = [string.strip(fields[4]),
                        string.strip(fields[5]),
                        string.split(string.strip(fields[6]),
                        ","),
                        string.strip(fields[7])]

        # put alle neededfields together to a neededlines list
        neededlines.append(neededfields)

    # this is to get a list of domains with each domain just once
    if neededfields[0] not in domains:
        domains.append(neededfields[0])
#Actually, I wanted to do this with a dictionary and the lists in the
#dictionary, but I couldn't get this working.

# empty list for lines per all domains
linesperdomains = []

for domain in domains:

    # empty list for lines
    domainlines = []
    for neededline in neededlines:
        if domain == neededline[0]:
            domainlines.append(neededline)

    # empty list for lines per domain
    linesperdomain = [domain, domainlines]

    # put all linesperdomain together to a linesperdomains list
    linesperdomains.append(linesperdomain)
                        

The program seams to be quite slow, but that's not that important.
Important for me is, to get this problem solved.
The next step is the spliting up into the three different parts.


Can anyone help?

Thanks in advance,
    Manuel
                                                                                                                                                                                          
-- 
It takes one tree to make a thousand matches, it takes one match to burn a 
thousand trees. 




More information about the Python-list mailing list