Reg Expression - Get position of >

M_H heyer.mario at googlemail.com
Wed Nov 26 04:48:59 EST 2008


On Nov 25, 11:06 pm, r <rt8... at gmail.com> wrote:
> On Nov 25, 4:33 pm, Jorgen Grahn <grahn+n... at snipabacken.se> wrote:
>
>
>
> > On Tue, 25 Nov 2008 12:41:53 -0800 (PST), r <rt8... at gmail.com> wrote:
> > > On Nov 25, 10:36 am, M_H <heyer.ma... at googlemail.com> wrote:
> > >> Hey,
>
> > >> I need the position of the last char >
>
> > >> Let's say I have a string
> > >> mystr =  <mimetype="text/html"><content><![CDATA[
>
> > >> I need the posistion of the "> (second sign) - so I can cut away the
> > >> first part.
>
> > >> The problem is that it can be like "> but also like " > or "     >
>
> > >> But it is def the quotes and the closing brakets.
>
> > >> How do I get the position of the >  ????
>
> > >> Hope you can help,
> > >> Bacco
>
> > > why not just spilt
>
> > >>>> mystr =  '<mimetype="text/html"><content><![CDATA['
> > >>>> mystr.split('>', 2)[-1]
> > > '<![CDATA['
>
> > > you don't want to use an re for something like this
>
> > Depends on if you have an irrational fear of REs or not ... I agree
> > that REs are overused for things which are better done with split, but
> > in this case I think an RE would be clearer.
>
> > >>> re.sub('.*>', '', 'dkjk>dj>>>>dd')
>
> > 'dd'
>
> > -- assuming he means what I think he means. The question was almost
> > impossible to comprehend.
>
> > /Jorgen
>
> > --
> >   // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
> > \X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!
>
> i think what M_H wanted was to find the second occurance of ">" char
> in  mystr.
> Now if mystr will always look exactly as show then Jorgen Grahn's re
> will work fine. But it looks to me that the poster only showed us a
> portion of the string, and as you can see the <mimetype tag is not
> closed in mystr, which would break your re, if the string acually
> extends further. Split would be fool-proof in all situations. But then
> again i had to read the post 5 times before i understood it. It may be
> advisable for M_H to repost the question in a clearer manner so that
> we can be sure our answers are correct!


Thanks for all your answers.
R is correct with his assumptions - sorry for the confusion.

So let me post it again, easier

I have a beginning of a (longer) string who is like:
mystr =  '<mimetype="text/html"><content><![CDATA['
or like
mystr =  '<mimetype="text/html" ><content><![CDATA['
or like
mystr =  '<mimetype="text/html" >
          NewLine <content><![CDATA['

I want to have the end-position of the mimetype tag (position as
mystr.find('>') returns, so I can use the number for a loop)
However, I can't use just the '>' because the character > could also
be in the string of mimetype (I know, actually not in mimetype, but
let's assume it).
So that is why the filter shall be bulletproof and check for '">' -
with possible spaces between both characters.

I don't know yet how to solve this issue - any recommendations?



More information about the Python-list mailing list