Extracting patterns after matching a regex

Tue Sep 8 09:32:36 EDT 2009

On Sep 8, 2:21 pm, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
> "Martin" <mdeka... at gmail.com> wrote in message
>
> news:5941d8f1-27c0-47d9-8221-d21f07200008 at j39g2000yqh.googlegroups.com...
>
>
>
> > Hi,
>
> > I need to extract a string after a matching a regular expression. For
> > example I have the string...
>
> > s = "FTPHOST: e4ftl01u.ecs.nasa.gov"
>
> > and once I match "FTPHOST" I would like to extract
> > "e4ftl01u.ecs.nasa.gov". I am not sure as to the best approach to the
> > problem, I had been trying to match the string using something like
> > this:
>
> > m = re.findall(r"FTPHOST", s)
>
> > But I couldn't then work out how to return the "e4ftl01u.ecs.nasa.gov"
> > part. Perhaps I need to find the string and then split it? I had some
> > help with a similar problem, but now I don't seem to be able to
> > transfer that to this problem!
>
> In regular expressions, you match the entire string you are interested in,
> and parenthesize the parts that you want to parse out of that string.  The
> group() method is used to get the whole string with group(0), and each of
> the parenthesized parts with group(n).  An example:
>
> >>> s = "FTPHOST: e4ftl01u.ecs.nasa.gov"
> >>> import re
> >>> re.search(r'FTPHOST: (.*)',s).group(0)
>
> 'FTPHOST: e4ftl01u.ecs.nasa.gov'>>> re.search(r'FTPHOST: (.*)',s).group(1)
>
> 'e4ftl01u.ecs.nasa.gov'
>
> -Mark

I see what you mean regarding the groups. Because my string is nested
in amongst others e.g.

MEDIATYPE: FtpPull\r\n', 'MEDIAFORMAT: FILEFORMAT\r\n', 'FTPHOST:
e4ftl01u.ecs.nasa.gov\r\n', 'FTPDIR: /PullDir/0301872638CySfQB\r\n',
'Ftp Pull Download Links: \r\n',

I get the information that follows as well. So is the only way to then
parse the new string? I am trying to construct something that is
fairly robust, so not sure just printing before the \r is the best
solution.

Thanks