Reg Expression - Get position of >

Chris Rebert clp at rebertia.com
Wed Nov 26 05:21:39 EST 2008


On Wed, Nov 26, 2008 at 1:48 AM, M_H <heyer.mario at googlemail.com> wrote:
> On Nov 25, 11:06 pm, r <rt8... at gmail.com> wrote:
>> On Nov 25, 4:33 pm, Jorgen Grahn <grahn+n... at snipabacken.se> wrote:
>>
>>
>>
>> > On Tue, 25 Nov 2008 12:41:53 -0800 (PST), r <rt8... at gmail.com> wrote:
>> > > On Nov 25, 10:36 am, M_H <heyer.ma... at googlemail.com> wrote:
>> > >> Hey,
>>
>> > >> I need the position of the last char >
>>
>> > >> Let's say I have a string
>> > >> mystr =  <mimetype="text/html"><content><![CDATA[
>>
>> > >> I need the posistion of the "> (second sign) - so I can cut away the
>> > >> first part.
>>
>> > >> The problem is that it can be like "> but also like " > or "     >
>>
>> > >> But it is def the quotes and the closing brakets.
>>
>> > >> How do I get the position of the >  ????
>>
>> > >> Hope you can help,
>> > >> Bacco
>>
>> > > why not just spilt
>>
>> > >>>> mystr =  '<mimetype="text/html"><content><![CDATA['
>> > >>>> mystr.split('>', 2)[-1]
>> > > '<![CDATA['
>>
>> > > you don't want to use an re for something like this
>>
>> > Depends on if you have an irrational fear of REs or not ... I agree
>> > that REs are overused for things which are better done with split, but
>> > in this case I think an RE would be clearer.
>>
>> > >>> re.sub('.*>', '', 'dkjk>dj>>>>dd')
>>
>> > 'dd'
>>
>> > -- assuming he means what I think he means. The question was almost
>> > impossible to comprehend.
>>
>> > /Jorgen
>>
>> > --
>> >   // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
>> > \X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!
>>
>> i think what M_H wanted was to find the second occurance of ">" char
>> in  mystr.
>> Now if mystr will always look exactly as show then Jorgen Grahn's re
>> will work fine. But it looks to me that the poster only showed us a
>> portion of the string, and as you can see the <mimetype tag is not
>> closed in mystr, which would break your re, if the string acually
>> extends further. Split would be fool-proof in all situations. But then
>> again i had to read the post 5 times before i understood it. It may be
>> advisable for M_H to repost the question in a clearer manner so that
>> we can be sure our answers are correct!
>
>
> Thanks for all your answers.
> R is correct with his assumptions - sorry for the confusion.
>
> So let me post it again, easier
>
> I have a beginning of a (longer) string who is like:
> mystr =  '<mimetype="text/html"><content><![CDATA['
> or like
> mystr =  '<mimetype="text/html" ><content><![CDATA['
> or like
> mystr =  '<mimetype="text/html" >
>          NewLine <content><![CDATA['
>
> I want to have the end-position of the mimetype tag (position as
> mystr.find('>') returns, so I can use the number for a loop)
> However, I can't use just the '>' because the character > could also
> be in the string of mimetype (I know, actually not in mimetype, but
> let's assume it).
> So that is why the filter shall be bulletproof and check for '">' -
> with possible spaces between both characters.
>
> I don't know yet how to solve this issue - any recommendations?

Any particular reason you're not using an HTML parser (e.g. BeautifulSoup) ?

Cheers,
Chris
-- 
Follow the path of the Iguana...
http://rebertia.com

> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list