[Tutor] (regular expression)

isaac tetteh itetteh34 at hotmail.com
Sat Dec 10 20:52:02 EST 2016


this is the real code


with urllib.request.urlopen("https://www.sdstate.edu/electrical-engineering-and-computer-science") as cs:
    cs_page = cs.read()
    soup = BeautifulSoup(cs_page, "html.parser")
    print(len(soup.body.find_all(string = ["Engineering","engineering"])))

i used control + f on the link in the code and i get 11 for ctrl + f and 3 for the code

THanks




________________________________
From: Tutor <tutor-bounces+itetteh34=hotmail.com at python.org> on behalf of Bob Gailer <bgailer at gmail.com>
Sent: Saturday, December 10, 2016 7:54 PM
To: Tetteh, Isaac - SDSU Student
Cc: Python Tutor
Subject: Re: [Tutor] (no subject)

On Dec 10, 2016 12:15 PM, "Tetteh, Isaac - SDSU Student" <
isaac.tetteh at jacks.sdstate.edu> wrote:
>
> Hello,
>
> I am trying to find the number of times a word occurs on a webpage so I
used bs4 code below
>
> Let assume html contains the "html code"
> soup = BeautifulSoup(html, "html.pa<http://html.pa>rser")
> print(len(soup.fi<http://soup.fi
>nd_all(string=["Engineering","engineering"])))
> But the result is different from when i use control + f on my keyboard to
find
>
> Please help me understand why it's different results. Thanks
> I am using Python 3.5
>
What is the URL of the web page?
To what are you applying control-f?
What are the two different counts you're getting?
Is it possible that the page is being dynamically altered after it's loaded?
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Tutor Info Page - Python<https://mail.python.org/mailman/listinfo/tutor>
mail.python.org
This list is for folks who want to ask questions regarding how to learn computer programming with the Python language and its standard library.





More information about the Tutor mailing list