[Tutor] How to write a loop in python to find HTML tags in a text file

Wed Mar 17 01:15:22 EDT 2021

Dear all,

The attached text file
<https://drive.google.com/file/d/1gVwRH-TlRks-ZJtb6P8vd3WkqcrIIAa0/view?usp=sharing>
contains
data from 3 news articles scraped from a news website. I would like to
write a loop that separates the metadata from the article body for each of
these three articles. The linked code <https://pastebin.com/FU2Axiuc>works
for a single news article only (i.e., if I keep only one article in the
text file). People have previously suggested using beautiful soup and
regular expressions, but please note that I just want to modify the
existing code to add a loop, and not use any other methods/functions.

Note about the text file:

   1. H1 class tag denotes the start of each article. The tag div
   class=<div class="story-element story-element-text\”> denotes the body text
   of the article. The text is in a Non-English script, but the tags are all
   in English.

Looking forward to your help!

Monzur

The output should probably look like this for 3 articles:

> <h1 class=………</div>> > *******> > <div class= ……</div>> > *******> > <h1 class=………</div>> > *******> > <div class= ……</div>> > *******> > <h1 class=………</div>> > *******> > <div class= ……</div>