How can I exclude a word by using re?

could ildg could.net at gmail.com
Mon Aug 15 21:09:11 EDT 2005


I want to use re because I want to extract something from a html. It
will be very complicated  without using re. But while using re, I
found that I must exlude a hole word "</td>", certainly, there are
many many "</td>" in this html.

My re is as below:
_____________________________________________
r=re.compile(ur'valign=top>(?P<number>\d{1,2})</td><td[^>]*>\s{0,2}'
ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
ur'(?P<name>.+)</td>',re.UNICODE|re.IGNORECASE)
_____________________________________________
There should be over 30 matches in the html. But I find nothing by
re.finditer(html) because my last line of re is wrong. I can't use
"(?P<name>.+)</td>" because there are many many "</td>" in the html
and I just want the ".*" to match what are before the firest "</td>".
So I think if there is some idea I can exclude a word, this will be
done. Assume there is "NOT(WORD)" can do it, I just need to write the
last line of the re as "(?P<name>(NOT(</td>))+)</td>".
But I still have no idea after thinking and trying for a very long time.

In other words, I want the "</td>" of "(?P<name>.+)</td>" to be
exactly the first "</td>" in this match. And there is more than one
match in this html, so this must be done by using re.

And I can't use any of your idea because what I want I deal with is a
very complicated html, not just a single line of word.

I can copy part of the html up to here but it's kinda too lengthy.
On 8/15/05, John Machin <sjmachin at lexicon.net> wrote:
> could ildg wrote:
> > In re, the punctuation "^" can exclude a single character, but I want
> > to exclude a whole word now. for example I have a string "hi, how are
> > you. hello", I want to extract all the part before the world "hello",
> > I can't use ".*[^hello]" because "^" only exclude single char "h" or
> > "e" or "l" or "o". Will somebody tell me how to do it? Thanks.
> 
> (1) Why must you use re? It's often a good idea to use string methods
> where they can do the job you want.
> (2) What do you want to have happen if "hello" is not in the string?
> 
> Example:
> 
> C:\junk>type upto.py
> def upto(strg, what):
>      k = strg.find(what)
>      if k > -1:
>          return strg[:k]
>      return None # or raise an exception
> 
> helo = "hi, how are you? HELLO I'm fine, thank you hello hello hello.
> that's it"
> 
> print repr(upto(helo, "HELLO"))
> print repr(upto(helo, "hello"))
> print repr(upto(helo, "hi"))
> print repr(upto(helo, "goodbye"))
> print repr(upto("", "goodbye"))
> print repr(upto("", ""))
> 
> C:\junk>upto.py
> 'hi, how are you? '
> "hi, how are you? HELLO I'm fine, thank you "
> ''
> None
> None
> ''
> 
> HTH,
> John
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list