Extracting patterns after matching a regex

Tue Sep 8 13:27:58 EDT 2009

Mart. wrote:

>> If, as Terry suggested, you do have a tuple of strings and the first element has FTPHOST, then s[0].split(":")[1].strip() will work.
> 
> It is an email which contains information before and after the main
> section I am interested in, namely...
> 
> FINISHED: 09/07/2009 08:42:31
> 
> MEDIATYPE: FtpPull
> MEDIAFORMAT: FILEFORMAT
> FTPHOST: e4ftl01u.ecs.nasa.gov
> FTPDIR: /PullDir/0301872638CySfQB
> Ftp Pull Download Links:
> ftp://e4ftl01u.ecs.nasa.gov/PullDir/0301872638CySfQB
> Down load ZIP file of packaged order:
> ftp://e4ftl01u.ecs.nasa.gov/PullDir/0301872638CySfQB.zip
> FTPEXPR: 09/12/2009 08:42:31
> MEDIA 1 of 1
> MEDIAID:
> 
> I have been doing this to turn the email into a string
> 
> email = sys.argv[1]
> f = open(email, 'r')
> s = str(f.readlines())

So don't do that. Or rather, scan the list of lines returned by 
.readlines *before* dumping it all into one line.

Or, try the email module. When the email parser returns a 
.message.Message instance, msg['FTPHOST'] will give you what you want.

tjr