brackets content regular expression

Sat Nov 1 07:27:10 EDT 2008

Yeah, I know it's quite simple to do manually. I was just interested
if it could be done by regular expressions. Thank you anyway.
On 1 нояб, 00:36, Matimus <mccre... at gmail.com> wrote:
> On Oct 31, 11:57 am, netimen <neti... at gmail.com> wrote:
>
>
>
>
>
> > Thank's but if i have several top-level groups and want them match one
> > by one:
>
> > text = "a < b < Ó > d > here starts a new group:  < e < f  > g >"
>
> > I want to match first " b < Ó > d " and then " e < f  > g " but not "
> > b < Ó > d > here starts a new group:  < e < f  > g "
> > On 31 ÏËÔ, 20:53, Matimus <mccre... at gmail.com> wrote:
>
> > > On Oct 31, 10:25šam, netimen <neti... at gmail.com> wrote:
>
> > > > I have a text containing brackets (or what is the correct term for
> > > > '>'?). I'd like to match text in the uppermost level of brackets.
>
> > > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > > bbb < a <tt š> ff > > 2 )?
>
> > > > P.S. sorry for my english.
>
> > > I think most people call them "angle brackets". Anyway it should be
> > > easy to just match the outer most brackets:
>
> > > >>> import re
> > > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > > >>> r = re.compile("<(.+)>")
> > > >>> m = r.search(text)
> > > >>> m.group(1)
>
> > > ' 1 aaa < t bbb < a <tt š> ff > > 2 '
>
> > > In this case the regular expression is automatically greedy, matching
> > > the largest area possible. Note however that it won't work if you have
> > > something like this: "<first> <second>".
>
> > > Matt
>
> As far as I know, you can't do that with a regular expressions (by
> definition regular expressions aren't recursive). You can use a
> regular expression to aid you, but there is no magic expression that
> will give it to you for free.
>
> In this case it is actually pretty easy to do it without regular
> expressions at all:
>
> >>> text = "a < b < O > d > here starts a new group:  < e < f  > g >"
> >>> def get_nested_strings(text, depth=0):
>
> ...     stack = []
> ...     for i, c in enumerate(text):
> ...         if c == '<':
> ...             stack.append(i)
> ...         elif c == '>':
> ...             start = stack.pop() + 1
> ...             if len(stack) == depth:
> ...                 yield text[start:i]
> ...>>> for seg in get_nested_strings(text):
>
> ...  print seg
> ...
>  b < O > d
>  e < f  > g
>
> Matt