unicode string literals and "u" prefix

nico nicolas.riesch at genevoise.ch
Mon Nov 8 09:00:28 EST 2004


In my python scripts, I use a lot of accented characters as I work in
french.
In order to do this, I put the line
# -*- coding: UTF-8 -*-
at the beginning of the script file.
Then, when I need to store accented characters in a string, I used to
prefix the literal string with 'u', like this:
mystring = u"prénom"

But if I understand well, prefixing a unicode string literal with 'u'
will eventually become obsolete ( in python 3.0 ? ), as all strings
will be unicode in a more or less distant future.

So, to write "clean" script code, is it a good idea to write a script
like this ?

---- myscript ----

#! /usr/local/bin/python -U
# -*- coding: UTF-8 -*-

s = 'hélène'
print len(s)
print s

-------------------

The second line says that all string literals are encoded in UTF-8, as
I work with an editor that saves all my files as UTF-8.

Normally, I should write
s = u'hélène'    but the -U python option make python considers string
literals as unicode string.
( I know the -U option can disappear in a next python version, but is
not better to delete the "-U" option at the top of the scripts than
all "u" unicode prefixes, when python will consider all strings as
unicode ?... )

Finally, I write
  print s
instead of
  print s.encode('utf-8')
as I used to because I want this script to work on computer with other
encodings.
It seems that "print" encodes by default with the shell current
encoding.

Is this the best way to deal with accented characters ?
Do you think that a script written like this will still work with
python 3.0 ?
Any comment ?



More information about the Python-list mailing list