[Tutor] How to write a loop in python to find HTML tags in a text file
S Monzur
sb.monzur at gmail.com
Wed Mar 17 01:15:22 EDT 2021
Dear all,
The attached text file
<https://drive.google.com/file/d/1gVwRH-TlRks-ZJtb6P8vd3WkqcrIIAa0/view?usp=sharing>
contains
data from 3 news articles scraped from a news website. I would like to
write a loop that separates the metadata from the article body for each of
these three articles. The linked code <https://pastebin.com/FU2Axiuc>works
for a single news article only (i.e., if I keep only one article in the
text file). People have previously suggested using beautiful soup and
regular expressions, but please note that I just want to modify the
existing code to add a loop, and not use any other methods/functions.
Note about the text file:
1. H1 class tag denotes the start of each article. The tag div
class=<div class="story-element story-element-text\”> denotes the body text
of the article. The text is in a Non-English script, but the tags are all
in English.
Looking forward to your help!
Monzur
The output should probably look like this for 3 articles:
> <h1 class=………</div>> > *******> > <div class= ……</div>> > *******> > <h1 class=………</div>> > *******> > <div class= ……</div>> > *******> > <h1 class=………</div>> > *******> > <div class= ……</div>
More information about the Tutor
mailing list