[Python-Dev] Re: PEP: Defining Unicode Literal Encodings (revision 1.1)
M.-A. Lemburg
mal@lemburg.com
Sat, 14 Jul 2001 13:32:10 +0200
Skip Montanaro wrote:
>=20
> mal> Here's an updated version which clarifies some issues...
> ...
> mal> I propose to make the Unicode literal encodings (both stan=
dard
> mal> and raw) a per-source file option which can be set using t=
he
> mal> "directive" statement proposed in PEP 244 in a slightly
> mal> extended form (by adding the '=3D' between the directive n=
ame and
> mal> it's value).
>=20
> I think you need to motivate the need for a different syntax than is de=
fined
> in PEP 244. I didn't see any obvious reason why the '=3D' is required.
I'm not picky about the '=3D'; if people don't want it, I'll
happily drop it from the PEP. The only reason I think it may be
worthwhile adding it is because it simply looks right:
directive unicodeencoding =3D 'latin-1'
rather than
directive unicodeencoding 'latin-1'
(Note that internally this will set a flag to a value, so the
assigning character of '=3D' seems to fit in nicely.)
=20
> Also, how do you propose to address /F's objections, particularly that =
the
> directive can't syntactically appear before the module's docstring (whe=
re it
> makes sense that the module author would logically want to use a non-de=
fault
> encoding)?
Guido hinted to the problem of breaking code, Tim objected
to requiring this.=20
I don't see the need to use Unicode literals
as module doc-strings, so I think the problem is not a real one
(8-bit strings can be written using any encoding just like you can=20
now).
Still, if people would like to use Unicode literals for module
doc-strings, then they should place the directive *before* the
doc-string accepting that this could break some tools (the PEP currently
does not restrict the placement of the directive). Alternatively,
we could allow placing the directive into a comment, e.g.
#!/usr/local/python
#directive unicodeencoding =3D 'utf-8'
u"""
This is a Unicode doc-string
"""
About Fredrik's idea that the source code should only use one=20
encoding:=20
Well, that's possible with the proposed directive, since=20
only Unicode literals carry data for Python is encoding-aware
and all other parts are under the programmer's control, e.g.
#!/usr/local/python
""" Module Docs...
"""
directive unicodeencoding =3D 'latin-1'
...
u =3D "H=E9ll=F4 W=F6rld !"
...
will give you pretty much what Fredrik asked for.=20
Note that since Python does not assign encoding information to=20
8-bit strings, comments etc. the only parts in a Python program=20
for which the programmer must explicitly tell Python which=20
encoding to assume are the Unicode literals.
--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Consulting & Company: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/