Extracting patterns after matching a regex

Tue Sep 8 12:19:14 EDT 2009

On Sep 8, 12:16 pm, nn <prueba... at latinmail.com> wrote:
> On Sep 8, 11:19 am, Dave Angel <da... at ieee.org> wrote:
>
>
>
> > Mart. wrote:
> > > <snip>
> > > I have been doing this to turn the email into a string
>
> > > email =ys.argv[1]
> > > f =open(email, 'r')
> > > s =str(f.readlines())
>
> > > so FTPHOST isn't the first element, it is just part of a larger
> > > string. When I turn the email into a string it looks like...
>
> > > 'FINISHED: 09/07/2009 08:42:31\r\n', '\r\n', 'MEDIATYPE: FtpPull\r\n',
> > > 'MEDIAFORMAT: FILEFORMAT\r\n', 'FTPHOST: e4ftl01u.ecs.nasa.gov\r\n',
> > > 'FTPDIR: /PullDir/0301872638CySfQB\r\n', 'Ftp Pull Download Links: \r
> > > \n', 'ftp://e4ftl01u.ecs.nasa.gov/PullDir/0301872638CySfQB\r\n', 'Down
> > > load ZIP file of packaged order:\r\n',
> > > <snip>
>
> > The mistake I see is trying to turn a list into a string, just so you
> > can try to parse it back again.  Just write a loop that iterates through
> > the list that readlines() returns.
>
> > DaveA
>
> No kidding.
>
> Instead of this:
> s = str(f.readlines())
>
> ftphost = re.search(r'FTPHOST: (.*?)\\r',s).group(1)
> ftpdir  = re.search(r'FTPDIR: (.*?)\\r',s).group(1)
> url = 'ftp://' + ftphost + ftpdir
>
> I would have possibly done something like this (not tested):
> lines = f.readlines()
> header={}
> for row in lines:
>     key,sep,value = row.partition(':')[2].rstrip()
>     header[key.lower()]=value
> url = 'ftp://' + header['ftphost'] + header['ftpdir']

Well I said not tested that would be of course:
lines = f.readlines()
header={}
for row in lines:
    key,sep,value = row.partition(':')
    header[key.lower()]=value.rstrip()
url = 'ftp://' + header['ftphost'] + header['ftpdir']