[Tutor] extract uri from beautiful soup string

Norman Khine norman at khine.net
Mon Oct 15 19:17:54 CEST 2012


On Mon, Oct 15, 2012 at 2:02 AM, Sander Sweers <sander.sweers at gmail.com> wrote:
> Sander Sweers schreef op ma 15-10-2012 om 02:35 [+0200]:
>> > On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers <sander.sweers at gmail.com> wrote:
>> > > Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]:
>> > Norman Khine schreef op ma 15-10-2012 om 00:17 [+0100]:
>> > i tried this: http://pastie.org/5059153
>
> Btw, if I understand what you are trying to do then you can do this much
> more simple. I noticed that all the a tags with onclick have an href
> attribute of '#'. To get all of these do something like:
>
> soup.findAll('a', {'href':'#'})

thanks, i used that

>
> Then use the attrmap eg attrMap['onclick'].split('\'')[1].
>
> Put together that may look like the below.
>
> for i in soup.findAll('a', {'href':'#'}):
>     if 'toolbar=0' in i.attrMap['onclick']:
>         print i.attrMap['onclick'].split('\'')[1]

i made an update: https://gist.github.com/3891927 which works based on
some of the recommendations posted here.

any suggestions for improvement?
>
> Greets
> Sander
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor



-- 
%>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or
chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] )


More information about the Tutor mailing list