newbie - HTML character codes

ardief rachele.defelice at gmail.com
Wed Dec 13 10:06:32 EST 2006


thank you both - in the end I used recode, which I wasn't aware of.
Fredrik, I had come across your script while googling for solutions,
but failed to make it work....

On Dec 13, 2:21 pm, "Fredrik Lundh" <fred... at pythonware.com> wrote:
> "ardief" wrote:
> > sorry if I'm asking something very obvious but I'm stumped. I have a
> > text that looks like this:
>
> > Sentence 401
> > 4.00pm  — We set off again; this time via Tony's home to collect
> > a variety of possessions, finally arriving at hospital no.3.
> > Sentence 402
> > 4.55pm  — Tony is ushered into a side ward with three doctors and
> > I stay outside with Mum.
>
> > And I want the HTML char codes to turn into their equivalent plain
> > text. I've looked at the newsgroup archives, the cookbook, the web in
> > general and can't manage to sort it out.
> > file = open('filename', 'r')
> > ofile = open('otherfile', 'w')
>
> > done = 0
>
> > while not done:
> >    line = file.readline()
> >    if 'THE END' in line:
> >        done = 1
> >    elif '—' in line:
> >        line.replace('—', '--')this returns a new line; it doesn't update the line in place.
>
> >        ofile.write(line)
> >    else:
> >        ofile.write(line)for a more general solution to the actual replace problem, see:
>
>    http://effbot.org/zone/re-sub.htm#unescape-html
>
> you may also want to lookup the "fileinput" module in the library reference
> manual.
> 
> </F>




More information about the Python-list mailing list