[Tutor] again... regular expression

lmac lopoff at gmx.net
Mon Nov 21 18:33:42 CET 2005


Ok. There is an error i made. The links in the HTML-Site are starting
with good.php so there was no way ever to find an link.

re_site = re.compile(r"good\.php.+'")
for a in file:
	z = re_site.search(a)
	if z != None:
		print z.group(0)


This will give me every line starting with "good.php" but does contain
not the first ' at the end, there are more tags and text which ends with
' too. So how can i tell in an regex to stop at the first found ' after
good.php ???

Thank you.


> Hallo.
> I want to parse a website for links of this type:
> 
> http://www.example.com/good.php?test=anything&egal=total&nochmal=nummer&so=site&seite=22">
> 
> ---------------------------------------------------------------------
> re_site = re.compile(r'http://\w+.\w+.\w+./good.php?.+">')
> for a in file:
> 	z = re_site.search(a)
> 	if z != None:
> 	print z.group(0)			
> 
> ---------------------------------------------------------------------
> 
> I still don't understand RE-Expressions. I tried some other expressions
>  but didn't get it work.
> 
> The End of the link is ">. So it should not be a problem to extract the
> link but it is.
> 
> Thank you for the help.
> 
> mac
> 



More information about the Tutor mailing list