[Tutor] Is there a better way?

Marco Casazza marco.vincenzo at gmail.com
Wed Jan 11 14:59:30 CET 2012


On 2012-01-11 07:57, Joel Goldstick wrote:
> On Wed, Jan 11, 2012 at 7:34 AM, Marco Casazza<marco.vincenzo at gmail.com>  wrote:
>> Hello,
>>
>> I've been slowly teaching myself python, using it for small projects when it
>> seems appropriate. In this case, I was handed a list of email addresses for
>> a mailing but some of them had been truncated. There are only 21 possible
>> email "suffixes" so I planned to just identify which it should be and then
>> replace it. However, when I started writing the code I realized that I'd be
>> doing a lot of "repeating". Is there a better way to "fix" the suffixes
>> without doing each individually? Here's my working code (for 4 colleges):
>>
>> import re
>> with file('c:\python27\mvc\mailing_list.txt', 'r') as infile:
>>     outlist = []
>>     for line in infile.read().split('\n'):
>>         if line.rstrip().lower().endswith('edu'):
>>             newline = line + '\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@bar", line):
>>             newline = re.sub("@bar.*", "@baruch.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@bcc", line):
>>             newline = re.sub("@bcc.*", "@bcc.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@bmc", line):
>>             newline = re.sub("@bmc.*", "@bmcc.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@leh", line):
>>             newline = re.sub("@leh.*", "@lehman.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>
>> with file('c:\python27\mvc\output.txt','w') as outfile:
>>     outfile.writelines(outlist)
>>
>> Thanks,
>> Marco
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
> First, look here about reading files:
> http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects
>
> I like this better:
>      f = open('filename', 'r')
>      for line in f:
>          print line     # this will give you one line at a time without
> the trailing newline
>
> Second, make a dictionary of with the key being what comes after the @
> in your truncated file.  The value will be the complete text you want:
>   d = {"bcc" : "bcc.cuny.edu", etc. }
>
> Third, use line.split('@') to split the line into what comes before
> and after the @ sign.  It will return a list
>      address_parts = line.split('@')
>
> address_parts[0] is what you want to keep as is. I'm guessing that the
> 3 characters after the @ will be enough to identify what the full
> address should look like, so
> if address_parts[1][0:3] in d:
>    result = '@'.join([address_parts[0], d[address_parts[1][0:3]])
>
> write the result to your out file.
>
> Its early in the morning for me, and this is untested, but it might
> give you some ideas.
>
Hi Joel,

Thanks. I like the dictionary idea... I hadn't thought of that because I 
was trying to fix one "problem" and then realized I had more, and then 
yet more, so it just kept growing--a case of not seeing the forest for 
the trees. And, if I split the address at the amphora I wouldn't need to 
worry about where exactly it was truncated, so no regular expressions to 
gather up the remaining characters after the key.

Thanks again,
Marco


More information about the Tutor mailing list