beautiful soup get class info

Peter Otten __peter__ at web.de
Wed Mar 12 03:36:40 EDT 2014


Christopher Welborn wrote:

> On 03/06/2014 02:22 PM, teddybubu at gmail.com wrote:
>> I am using beautifulsoup to get the title and date of the website.
>> title is working fine but I am not able to pull the date. Here is the
>> code in the url:
>>
>>   <span class="date">October 22, 2011</span>
>>
>> In Python, I am using the following code:
>> date1 = soup.span.text
>> data=soup.find_all(date="value")
>>
>> Results in:
>>
>> []
>> March 5, 2014
>>
>> What is the proper way to get this info?
>> Thanks.
>>
> 
> I believe it's the 'attrs' argument.
> http://www.crummy.com/software/BeautifulSoup/bs4/doc/
> 
> # Workaround the 'class' problem:
> data = soup.find_all(attrs={'class': 'date'})
> 
> I haven't tested it, but it's worth looking into.
 
Yes there are two ways to filtr by class:

>>> soup = bs4.BeautifulSoup("""
... <span class="one">alpha</span>
... <span class="two">beta</span>""")

Use attrs:

>>> soup.find_all(attrs={"class": "one"})
[<span class="one">alpha</span>]

Append an underscore:

>>> soup.find_all(class_="two")
[<span class="two">beta</span>]





More information about the Python-list mailing list