newbie - HTML character codes

Fredrik Lundh fredrik at pythonware.com
Wed Dec 13 09:21:01 EST 2006


"ardief" wrote:

> sorry if I'm asking something very obvious but I'm stumped. I have a
> text that looks like this:
>
> Sentence 401
> 4.00pm  — We set off again; this time via Tony's home to collect
> a variety of possessions, finally arriving at hospital no.3.
> Sentence 402
> 4.55pm  — Tony is ushered into a side ward with three doctors and
> I stay outside with Mum.
>
> And I want the HTML char codes to turn into their equivalent plain
> text. I've looked at the newsgroup archives, the cookbook, the web in
> general and can't manage to sort it out.

> file = open('filename', 'r')
> ofile = open('otherfile', 'w')
>
> done = 0
>
> while not done:
>    line = file.readline()
>    if 'THE END' in line:
>        done = 1
>    elif '—' in line:
>        line.replace('—', '--')

this returns a new line; it doesn't update the line in place.

>        ofile.write(line)
>    else:
>        ofile.write(line)

for a more general solution to the actual replace problem, see:

    http://effbot.org/zone/re-sub.htm#unescape-html

you may also want to lookup the "fileinput" module in the library reference
manual.

</F> 






More information about the Python-list mailing list