Re: string u'hyv\xe4' to file as 'hyvä'

Alex Willmer alex at moreati.org.uk
Mon Dec 27 04:55:47 EST 2010


On Dec 27, 6:47 am, "Mark Tolonen" <metolone+gm... at gmail.com> wrote:
> "gintare" <g.statk... at gmail.com> wrote in message
> > In file i find 'hyv\xe4' instead of hyv .
>
> When you open a file with codecs.open(), it expects Unicode strings to be
> written to the file.  Don't encode them again.  Also, .writelines() expects
> a list of strings.  Use .write():
>
>     import codecs
>     item=u'hyv\xe4'
>     F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
>     F.write(item)
>     F.close()

Gintare, Mark's code is correct. When you are reading the file back
make sure you understand what you are seeing:

>>> F2 = codecs.open('finnish.txt', 'r', 'utf8')
>>> item2 = F2.read()
>>> item2
u'hyv\xe4'

That might like as though item2 is 7 characters long, and it contains
a backslash followed by x, e, 4. However item2 is identical to item,
they both contain 4 characters - the final one being a-umlaut. Python
has shown the string using a backslash escape, because printing a non-
ascii character might fail. You can see this directly, if your Python
session is running in a terminal (or GUI) that can handle non-ascii
characters:

>>> print item2
hyvä



More information about the Python-list mailing list