newbie - HTML character codes
Fredrik Lundh
fredrik at pythonware.com
Wed Dec 13 09:21:01 EST 2006
"ardief" wrote:
> sorry if I'm asking something very obvious but I'm stumped. I have a
> text that looks like this:
>
> Sentence 401
> 4.00pm — We set off again; this time via Tony's home to collect
> a variety of possessions, finally arriving at hospital no.3.
> Sentence 402
> 4.55pm — Tony is ushered into a side ward with three doctors and
> I stay outside with Mum.
>
> And I want the HTML char codes to turn into their equivalent plain
> text. I've looked at the newsgroup archives, the cookbook, the web in
> general and can't manage to sort it out.
> file = open('filename', 'r')
> ofile = open('otherfile', 'w')
>
> done = 0
>
> while not done:
> line = file.readline()
> if 'THE END' in line:
> done = 1
> elif '—' in line:
> line.replace('—', '--')
this returns a new line; it doesn't update the line in place.
> ofile.write(line)
> else:
> ofile.write(line)
for a more general solution to the actual replace problem, see:
http://effbot.org/zone/re-sub.htm#unescape-html
you may also want to lookup the "fileinput" module in the library reference
manual.
</F>
More information about the Python-list
mailing list