[Python-Dev] Divorcing str and unicode (no more implicitconversions).

Tue Oct 25 13:31:50 CEST 2005

Fredrik Lundh wrote:
> M.-A. Lemburg wrote:
> 
> 
>>I don't follow you here. The source code encoding
>>is only applied to Unicode literals (you are using string
>>literals in your example). String literals are passed
>>through as-is.
> 
> 
> however, for Python 3000, it would be nice if the source-code encoding applied
> to the *entire* file (XML-style), rather than just unicode string literals and (hope-
> fully) comments and docstrings.

Actually, the encoding is applied to the complete source file:
the file is transcoded into UTF-8 and then parsed by the
Python parser.

Unicode literals are then decoded from the UTF-8 into Unicode.
String literals are transcoded back into the source code encoding,
thus making the (rather long due to technical constraints) round-trip
source code encoding -> Unicode -> UTF-8 -> Unicode -> source code encoding.

Python 3k should have a fully Unicode based parser to reduce this
additional transcoding overhead.

Since Py3k will only have Unicode literals, the problems with
string literals will go away all by themselves :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::