[Tutor] Is there a better way?

Wed Jan 11 14:59:30 CET 2012

On 2012-01-11 07:57, Joel Goldstick wrote:
> On Wed, Jan 11, 2012 at 7:34 AM, Marco Casazza<marco.vincenzo at gmail.com>  wrote:
>> Hello,
>>
>> I've been slowly teaching myself python, using it for small projects when it
>> seems appropriate. In this case, I was handed a list of email addresses for
>> a mailing but some of them had been truncated. There are only 21 possible
>> email "suffixes" so I planned to just identify which it should be and then
>> replace it. However, when I started writing the code I realized that I'd be
>> doing a lot of "repeating". Is there a better way to "fix" the suffixes
>> without doing each individually? Here's my working code (for 4 colleges):
>>
>> import re
>> with file('c:\python27\mvc\mailing_list.txt', 'r') as infile:
>>     outlist = []
>>     for line in infile.read().split('\n'):
>>         if line.rstrip().lower().endswith('edu'):
>>             newline = line + '\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@bar", line):
>>             newline = re.sub("@bar.*", "@baruch.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@bcc", line):
>>             newline = re.sub("@bcc.*", "@bcc.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@bmc", line):
>>             newline = re.sub("@bmc.*", "@bmcc.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>         elif re.search("@leh", line):
>>             newline = re.sub("@leh.*", "@lehman.cuny.edu", line)+'\n'
>>             outlist.append(newline.lower())
>>
>> with file('c:\python27\mvc\output.txt','w') as outfile:
>>     outfile.writelines(outlist)
>>
>> Thanks,
>> Marco
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
> First, look here about reading files:
> http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects
>
> I like this better:
>      f = open('filename', 'r')
>      for line in f:
>          print line     # this will give you one line at a time without
> the trailing newline
>
> Second, make a dictionary of with the key being what comes after the @
> in your truncated file.  The value will be the complete text you want:
>   d = {"bcc" : "bcc.cuny.edu", etc. }
>
> Third, use line.split('@') to split the line into what comes before
> and after the @ sign.  It will return a list
>      address_parts = line.split('@')
>
> address_parts[0] is what you want to keep as is. I'm guessing that the
> 3 characters after the @ will be enough to identify what the full
> address should look like, so
> if address_parts[1][0:3] in d:
>    result = '@'.join([address_parts[0], d[address_parts[1][0:3]])
>
> write the result to your out file.
>
> Its early in the morning for me, and this is untested, but it might
> give you some ideas.
>
Hi Joel,

Thanks. I like the dictionary idea... I hadn't thought of that because I 
was trying to fix one "problem" and then realized I had more, and then 
yet more, so it just kept growing--a case of not seeing the forest for 
the trees. And, if I split the address at the amphora I wouldn't need to 
worry about where exactly it was truncated, so no regular expressions to 
gather up the remaining characters after the key.

Thanks again,
Marco