Using Regular Expresions to change .htm to .php in files

Ryan Ginstrom software at ginstrom.com
Sun Aug 26 21:00:42 EDT 2007


> On Behalf Of Mark
> This line should be:
> 
> sed "s/\.htm$/.php/g" < $each > /tmp/$$

I think a more robust way to go about this would be:

(1) Use os.walk to walk through the directory
   http://docs.python.org/lib/os-file-dir.html

(2) Use Beautiful Soup to extract the internal links from each file
   http://crummy.com/software/BeautifulSoup/documentation.html

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(doc)
links = soup('a')
internal_links = [link["href"]
                        for link in links
                        if link.has_key("href") and not
link["href"].startswith("http")]

(4) Do straight string replacements on those links (no regex needed)

(5) Save each html file to *.html.bak before changing


Regards,
Ryan Ginstrom




More information about the Python-list mailing list