Mutating an HTML file with BeautifulSoup

Jon Ribbens jon+usenet at unequivocal.eu
Sat Aug 20 14:16:02 EDT 2022


On 2022-08-20, Chris Angelico <rosuav at gmail.com> wrote:
> On Sun, 21 Aug 2022 at 03:27, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
>> 2QdxY4RzWzUUiLuE at potatochowder.com writes:
>> >textual representations.  That way, the following two elements are the
>> >same (and similar with a collection of sub-elements in a different order
>> >in another document):
>>
>>   The /elements/ differ. They have the /same/ infoset.
>
> That's the bit that's hard to prove.
>
>>   The OP could edit the files with regexps to create a new version.
>
> To you and Jon, who also suggested this: how would that be beneficial?
> With Beautiful Soup, I have the line number and position within the
> line where the tag starts; what does a regex give me that I don't have
> that way?

You mean you could use BeautifulSoup to read the file and identify the
bits you want to change by line number and offset, and then you could
use that data to try and update the file, hoping like hell that your
definition of "line" and "offset" are identical to BeautifulSoup's
and that you don't mess up later changes when you do earlier ones (you
could do them in reverse order of line and offset I suppose) and
probably resorting to regexps anyway in order to find the part of the
tag you want to change ...

... or you could avoid all that faff and just do re.sub()?


More information about the Python-list mailing list