[Tutor] lstrip() question

Mon Feb 2 20:38:00 EST 2004

On  2 Feb 2004, Tim Johnson <- tim at johnsons-web.com wrote:

>  Actually what I am *really* looking for is a way to
>  strip either specific tags (like '<br>' from the
>  left *or* any number of tags. I'm a regex dunce

I think I forgot the parentheses in my first example.  But as you
describe your problem now a simple regexp won't suffice.

To strip off all tags is simple;
As a function (now also case-insensitive and includes line breaks):

def strip_tags (string):
    return re.sub('^(?is)(<[^<>]*>)*', '', string)

But to strip specific tags you need e.g.:

def strip_tags (string, tag=''):
    L = re.split('(<[^<>]*>)', string)
    for token in L[:]:
        if re.match('<.*>', token, re.S) or token == '':
            if re.search(tag, token, re.I|re.S): L.remove(token)
        else:
            return ''.join(L)

>>> tmp = '<br/ ><br><foo><bar>real estate<br>broker<br>'
>>> strip_tags(tmp, 'foo')
'<br/ ><br><bar>real estate<br>broker<br>'
>>> strip_tags(tmp, 'foo|bar')
'<br/ ><br>real estate<br>broker<br>'
>>> strip_tags(tmp)
'real estate<br>broker<br>'

Above will split your string with the tags as seperators.  Then the code
iterates over the list, looks if the next token is a tag (if not all
tags from the left have been processed and the joined list is returned)
or an empty string, tries to match the code against the tag names to be
removed and if it finds a match removes exactly that token from the
list.  Above code iterates over a copy of the list since it's not good
to  change the list which is iterated over.  If you don't give a tag
name (actually a regexp) the empty string is used which matches every
tag.

   Karl
-- 
Please do *not* send copies of replies to me.
I read the list