python re - a not needed
Peter Otten
__peter__ at web.de
Thu Dec 16 04:21:22 EST 2004
kepes.krisztian wrote:
> Hi !
>
> I want to get infos from a html, but I need all chars except <.
> All chars is: over chr(31), and over (128) - hungarian accents.
> The .* is very hungry, it is eat < chars too.
>
> If I can use not, I simply define an regexp.
> [not<]*</a>
>
> It is get all in the href.
>
> I wrote this programme, but it is too complex - I think:
>
> import re
>
> l=[]
> for i in range(33,65):
> if i<>ord('<') and i<>ord('>'):
> l.append('\\'+chr(i))
> s='|'.join(l)
> all='\w|\s|\%s-\%s|%s'%(chr(128),chr(255),s)
> sre='<Subj>([%s]{1,1024})</d>'%all
> #sre='<Subj>([?!\\<]{1,1024})</d>'
> s='<Subj>xmvccv ÁÁÁ sdfkdsfj eirfie</d><A></d>'
>
>
> print sre
> print s
> cp=re.compile(sre)
> m=cp.search(s)
> print m.groups()
>
> Have the python an regexp exception, or not function ? How to I use it ?
>
> Thanx for help:
> kk
You could try these regexps or variants thereof:
"<Subj>([^<]*)"
'^' changes the character set to exclude any characters listed after '^'
from matching.
"<Subj>(.*?)<"
The '?' makes the preceding '*' non-greedy, i. e. the following '<' will
match the first '<' character encountered in the string to be searched.
Peter
More information about the Python-list
mailing list