Need direction on mass find/replacement in HTML files

Tim Chase python.list at tim.thechases.com
Fri Apr 30 16:55:49 EDT 2010


On 04/30/2010 02:54 PM, KevinUT wrote:
> I want to globally change the following:<a href="http://
> www.mysite.org/?page=contacts"><font color="#269BD5">
>
> into:<a href="pages/contacts.htm"><font color="#269BD5">

Normally I'd just do this with sed on a *nix-like OS:

    find . -iname '*.html' -exec sed -i.BAK 
's at href="http://www.mysite.org/?page=\([^"]*\)@href="pages/\1.htm at g' 
{} \;

This finds all the HTML files (*.html) under the current 
directory ('.') calling sed on each one.  Sed then does the 
substitution you describe, changing

   href="http://www.mysite.org/?page=<whatever>

into

   href="pages/<whatever>.htm

moving the original file to a .BAK file (you can omit the 
"-i.BAK" parameter if you don't want this backup behavior; 
alternatively assuming you don't have any pre-existing .BAK 
files, after you've vetted the results, you can then use

find . -name '*.BAK' -exec rm {} \;

to delete them all) and then overwrites the original with the 
modified results.

Yes, one could hack up something in Python, perhaps adding some 
real HTML-parsing brains to it, but for the most part, that 
one-liner should do what you need.  Unless you're stuck on Win32 
with no Cygwin-like toolkit

-tkc






More information about the Python-list mailing list