[Tutor] Coding question

dn PyTutor at DancesWithMice.info
Sat Nov 28 13:21:37 EST 2020


On 29/11/2020 05:31, Erin Anderson wrote:
> Hello, I am trying to figure out how to code for reading in text from a URL in python but in two chunks.
> 
> I am looking at a transcript from a website and I want to read in the text but I want the reading in of the text to stop when the transcript says “Part 2”, I then want to have this chunk of information as one entity and then create another entity filled with the text that occurs after the words “Part 2”. Im thinking one way to do this is using a while loop, but I am not quite sure how to implement it
> 
> Def text_chunk(url)
> 	webpage=web.urlopen(url)
> 	while text != “Part 2”:

The applicable Python idiom is to use the find() method:

	text.find( "Part 2" )

> 		rawbytes=webpage.read()
> 		webpage.close()

This will close the webpage. Accordingly, if "Part 2" is not in text, 
the while-loop will repeat, but webpage will not be open!


> 		text = rawBytes.decode('utf-8’)
> 	return text


If you wish to persist with this idea, then consider that web-pages, 
indeed whole books, can easily 'fit' into the average computer's 
storage-space. So, rather than thinking of "two chunks", read the whole 
web-page first, and only later figure-out which part of the page you 
want to keep/discard.


What you are describing is known as "web scraping". There are a number 
of Python tools which will accomplish the mechanics for you. 
Traditionally this has been an adaptation of "BeautifulSoup" (web pages 
being described as a 'soup' of HTML tags and/or "<" and ">" symbols). 
Such would make it quicker/easier to meet the stated objective!
-- 
Regards =dn


More information about the Tutor mailing list