How to loop over a text file (to remove tags and normalize) using Python

Peter Otten __peter__ at web.de
Wed Mar 10 04:46:20 EST 2021


On 10/03/2021 04:35, S Monzur wrote:
> Thanks! I ended up using beautiful soup to remove the html tags and create
> three lists (titles of article, publications dates, main body) but am still
> facing a problem where the list is not properly storing the main body.
> There is something wrong with my code for that section, and any comment
> would be really helpful!
>
>   ListFile Text
> <https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>

How did you create that file?

 > BeautifulSoup code for removing tags <https://pastebin.com/qvbVMUGD>

> print(bodytext[0]) # so here, I'm only getting the first paragraph of the body of the first article, not all of the first article
>
> print(bodytext[1]) # here, I'm getting the second paragraph of the first article, and not the second article

It may help if you process the individual articles with beautiful soup,
not the whole list at once.


More information about the Python-list mailing list