HTML "sanitizer" in Python

Stephan Houben stephan at pcrm.win.tue.nl
Thu Apr 29 09:49:55 EDT 1999


"Scott Stirling" <SSTirlin at holnam.com> writes:


> 1) What is the Python syntax for opening a file in MS Windows?  I was following Guido's tutorial yesterday, but I could not figure out how to open a file in Windows.

??? I don't think it's different on windows than on linux.
Just do:

f = open("my_file.html", "rt")

(OK, there *is* a difference, I guess; you really need the "t" in "rt".
 Otherwise the carriage returns show up in your file.)

> 2) How do I find a string of text in the open file and delete it iteratively?

Check out the "string" module.

> 3) How do I save the file in Windows after I have edited it with the Python program?  How do I close it?

Well you open a second file, for writing this time:
  f2 = open("output.html", "wt")

Then you write to it to your heart's content:
  f2.write("blahblahblah")

Then you close it:
  f2.close()

But all this is in the Python docs, so perhaps you should try to read them.

> 4) If someone helps me out, I think I should be able to use this info. and the tutorial and the Lutz book to loop the process and make the program run until all *.htm files in a folder have been handled once.

Well, if I understand correctly, the *only* thing you're trying to do
is to remove some specific strings from a bunch of files. Now if I
were you, I wouldn't even bother to use Python on something that
simple; I would just use sed. With sed, you could do:

  sed 'g/string_to_be_eliminated//g' my_file.html > output.html

Presto, that's it.  I think that there is a version for GNU sed for
Windows somewhere out there; do yourself a favour and get it.

Greetings,

Stephan




More information about the Python-list mailing list