regexp problem in Python

Sönmez Kartal rainwatching at gmail.com
Sat Aug 4 13:51:36 EDT 2007


On 4 A ustos, 17:10, Ehsan <ehsan.khod... at gmail.com> wrote:
> On Aug 4, 1:22 pm, Sönmez Kartal <rainwatch... at gmail.com> wrote:
>
>
>
>
>
>
>
> > On 4 A ustos, 00:41, Ehsan <ehsan.khod... at gmail.com> wrote:
>
> > > I want to find "http://www.2shared.com/download/1716611/e2000f22/
> > > Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11"  or 3gp instead of
> > > wmv in the text file like this :
> > > <html>
> > > ""some code""
> > > function reportAbuse() {
> > >     var windowname="abuse";
> > >     var url="/abuse.jsp?link=" + "http://www.2shared.com/file/1716611/
> > > e2000f22/Jadeed_Mlak14.html";
> > >     OpenWindow =
> > > window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500­,height=500,left=50,top=50');
> > >     OpenWindow.focus();
> > >   }
> > >   function startDownload(){
> > >     window.location = "http://www.2shared.com/download/1716611/
> > > e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11";
> > >     //document.downloadForm.submit();
> > >   }
> > >   </script>
> > > </head>
> > > </html>http://www.2shared.com/download/1716611/e2000f22/
> > > Jadeed_Mlak14.3gp?tsid=20070803-164051-9d637d11"sfgsfgsfgv
>
> > > I use this pattern :
> > > "http.*?\.(wmv|3gp).*""
>
> > > but it returns only 'wmv' and '3gp' instead of "http://www.2shared.com/
> > > download/1716611/e2000f22/Jadeed_Mlak14.wmv?
> > > tsid=20070803-164051-9d637d11"
>
> > > what can I do? what's wrong whit this pattern? thanx for your comments
>
> > You could use r'window.location = "(.*?\.(wmv|3gp)";' as your regex
> > string, I guess..- Hide quoted text -
>
> > - Show quoted text -
>
> I didn't get what do you mean? i think i must just change the pattern
> but I don't know how to find bestfit pattern

If you append "window.location = " and ';' to your pattern, it would
be more clear to detect it.

r'window.location = "(.*?)";'

... I have used this and it gave me ...
>>> data = """ <html>
... ""some code""
... function reportAbuse() {
...     var windowname="abuse";
...     var url="/abuse.jsp?link=" + "http://www.2shared.com/file/
1716611/e2000f22/Jadeed_Mlak14.html";
...     OpenWindow =
...
window.open(url,windowname,'toolbar=no,scrollbars=no,resizable=no,width=500,height=500,left=50,top=50');
...     OpenWindow.focus();
...   }
...   function startDownload(){
...     window.location = "http://www.2shared.com/download/1716611/
e2000f22/Jadeed_Mlak14.wmv?tsid=20070803-164051-9d637d11";
...     //document.downloadForm.submit();
...   }
...   </script>
... </head>
... </html>"""
>>> re.findall(r'window.location = "(.*?)";', data)
['http://www.2shared.com/download/1716611/e2000f22/Jadeed_Mlak14.wmv?
tsid=20070803-164051-9d637d11']
>>> print 'It works! :-)'
It works! :-)
>>>

Happy coding




More information about the Python-list mailing list