Using Regular Expresions to change .htm to .php in files
Ryan Ginstrom
software at ginstrom.com
Sun Aug 26 21:00:42 EDT 2007
> On Behalf Of Mark
> This line should be:
>
> sed "s/\.htm$/.php/g" < $each > /tmp/$$
I think a more robust way to go about this would be:
(1) Use os.walk to walk through the directory
http://docs.python.org/lib/os-file-dir.html
(2) Use Beautiful Soup to extract the internal links from each file
http://crummy.com/software/BeautifulSoup/documentation.html
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(doc)
links = soup('a')
internal_links = [link["href"]
for link in links
if link.has_key("href") and not
link["href"].startswith("http")]
(4) Do straight string replacements on those links (no regex needed)
(5) Save each html file to *.html.bak before changing
Regards,
Ryan Ginstrom
More information about the Python-list
mailing list