matching patterns after regex?

Wed Aug 12 08:12:22 EDT 2009

On Aug 12, 12:53 pm, Bernard <bernard.ch... at gmail.com> wrote:
> On 12 août, 06:15, Martin <mdeka... at gmail.com> wrote:
>
>
>
> > Hi,
>
> > I have a string (see below) and ideally I would like to pull out the
> > decimal number which follows the bounding coordinate information. For
> > example ideal from this string I would return...
>
> > s = '\nGROUP                  = ARCHIVEDMETADATA\n
> > GROUPTYPE            = MASTERGROUP\n\n  GROUP                  =
> > BOUNDINGRECTANGLE\n\n    OBJECT                 =
> > NORTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
> > VALUE                = 19.9999999982039\n    END_OBJECT             =
> > NORTHBOUNDINGCOORDINATE\n\n    OBJECT                 =
> > SOUTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
> > VALUE                = 9.99999999910197\n    END_OBJECT             =
> > SOUTHBOUNDINGCOORDINATE\n\n    OBJECT                 =
> > EASTBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
> > VALUE                = 10.6506458717851\n    END_OBJECT             =
> > EASTBOUNDINGCOORDINATE\n\n    OBJECT                 =
> > WESTBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
> > VALUE                = 4.3188348375893e-15\n    END_OBJECT
> > = WESTBOUNDINGCOORDINATE\n\n  END_GROUP
>
> > NORTHBOUNDINGCOORDINATE = 19.9999999982039
> > SOUTHBOUNDINGCOORDINATE = 9.99999999910197
> > EASTBOUNDINGCOORDINATE = 10.6506458717851
> > WESTBOUNDINGCOORDINATE = 4.3188348375893e-15
>
> > so far I have only managed to extract the numbers by doing re.findall
> > ("[\d.]*\d", s), which returns
>
> > ['1',
> >  '19.9999999982039',
> >  '1',
> >  '9.99999999910197',
> >  '1',
> >  '10.6506458717851',
> >  '1',
> >  '4.3188348375893',
> >  '15',
> > etc.
>
> > Now the first problem that I can see is that my string match chops off
> > the "e-15" part and I am not sure how to incorporate the potential for
> > that in my pattern match. Does anyone have any suggestions as to how I
> > could also match this? Ideally I would have a statement which printed
> > the number between the two bounding coordinate strings for example
>
> > NORTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
> > VALUE                = 19.9999999982039\n    END_OBJECT             =
> > NORTHBOUNDINGCOORDINATE\n\n
>
> > Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
> > decimal number before it hit the next string
> > "NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
> > suggestions would be appreciated.
>
> > Many thanks
>
> > Martin
>
> Hey Martin,
>
> here's a regex I've just tested : (\w+COORDINATE).*\s+VALUE\s+=\s([\d\.
> \w-]+)
>
> the first match corresponds to the whateverBOUNDINGCOORDINATE and the
> second match is the value.
>
> please provide some more entries if you'd like me to test my regex
> some more :)
>
> cheers
>
> Bernard

Thanks Bernard it doesn't seem to be working for me...

I tried

re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

is that what you meant? Apologies if not, that results in a syntax
error:

In [557]: re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
------------------------------------------------------------
   File "<ipython console>", line 1
     re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
                                                              ^
SyntaxError: unexpected character after line continuation character

Thanks