[Tutor] Problem getting data using beautifulsoup4 + python 3.4.1

Peter Otten __peter__ at web.de
Thu Sep 18 23:10:52 CEST 2014


Juan Christian wrote:

> My code:
> 
> import requests
> import bs4
> 
> 
> FORUM_ID = "440"
> 
> response = requests.get('
> http://steamcommunity.com/app/{id}/tradingforum'.format(id = FORUM_ID))
> soup = bs4.BeautifulSoup(response.text)
> topics = [a.attrs.get('href') for a in
> soup.select('a.forum_topic_overlay')]
> 
> for topic in topics:
> r = requests.get(topic)
> s = bs4.BeautifulSoup(r.text)
> 
> username = [a.get_text() for a in s.select('div.authorline')]
> profile = [a.attrs.get('href') for a in s.select('div.authorline')]
> 
> print(s.select('div.authorline'))
> print("\nProfile value: " + str(profile))
> print("\n==================================\n")
> 
> 
> Now, let's talk about the problem. The print(s.select('div.authorline'))
> prints what I want, that is the part of the page that I need:
> 
> [<div class="authorline">
> <a class="hoverunderline forum_op_author
> commentthread_author_globalmoderator" data-miniprofile="40662867" href="
> http://steamcommunity.com/id/FrazerJC" onclick="return Forum_AuthorMenu(
> this, event, false, '810938082603415962', '-1', 40662867, 'FrazerJC' );">
>                         FrazerJC<span
> class="forum_author_action_pulldown"></span></a><img height="12" src="
> http://steamcommunity-a.akamaihd.net/public/images/skin_1/comment_modindicator_moderator.png"
> title="Moderator" width="12"> <span class="date">14 Oct, 2013 @
> 3:31pm</span></img></div>]
> 
> 
> But, the print("\nProfile value: " + str(profile)) isn't printing what I
> want. It's giving me "Profile value: [None]". This should give me the link
> to the person's profile, in this example, "
> http://steamcommunity.com/id/FrazerJC". I was following the bs4 doc, and I
> did the [a.attrs.get('href') for a in s.select('div.authorline')] in order
> to get the value of the href, but it isn't working, of course I made
> something wrong, but where?

The <div ...> doesn't have an href attribute, its child <a ...> has. Try

[a.attrs["href"] for a in s.select("div.authorline a")]




More information about the Tutor mailing list