Anyway to designating the encoding of the "source" for compile?

Mon May 16 19:27:19 EDT 2005

John Machin 작성:
> On 16 May 2005 10:15:22 -0700, janeaustine50 at hotmail.com wrote:
>
> >janeaustine50 at hotmail.com wrote:
> >> Python's InteractiveInterpreter uses the built-in compile
function.
> >>
> >> According to the ref. manual, it doesn't seem to concern about the
> >> encoding of the source string.
> >>
> >> When I hand in an unicode object, it is encoded in utf-8
> >automatically.
> >> It can be a problem when I'm building an interactive environment
> >using
> >> "compile", with a different encoding from utf-8.
>
> I don't understand this. Suppose your "different encoding" is cp125x
> (where x is a digit). Would you not do something like this?
>
> compile_input = user_input.decode('cp125x')
> code_object = compile(compile_input, ......
>
>
> >> IDLE itself has the
> >> same problem. ( '<a string with non-ascii-encoding>' is treated
okay
> >> but u'<a string with non-ascii-encoding>' is treated wrong.)
> >>
> >> Any suggestions or any plans in future python versions?
> >
> >I've read a posting from Martin Von Loewis mentioning trying to
build
> >in that feature(optionally marking encoding when calling "compile").
> >Anyone knows how it is going on?
>
> Firstly, it would help those who might be trying to help you if you
> could post a simple example: input, output, what error message, what
> you mean by 'is treated wrong' ... and when it comes to Unicode
> objects (indeed any text), show us repr(text) -- "what you see is
> often not what others see and often not what you've actually got".
>
> Secondly, are any of the contents of PEP 263 of any use to you?
> http://www.python.org/peps/pep-0263.html

Okay, I'll use one of the CJK codecs as the example. EUC-KR is the
default encoding.

>>> import sys;sys.getdefaultencoding()
'euc-kr'
>>> '한글'
'\xc7\xd1\xb1\xdb'
>>> u'한글'
u'\ud55c\uae00'
>>> s=compile("inside=u'한글'",'','single')
>>> exec s
>>> inside #wrong
u'\xc7\xd1\xb1\xdb'
>>> s=compile(u"inside=u'한글'",'','single')
>>> exec s
>>> inside #correct
u'\ud55c\uae00'

So I reckon that the "compile" should get a unicode object. However...

C:\Python24\Lib>python code.py
> <string>(1)?()
(Pdb) c
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> '한글'
'\xc7\xd1\xb1\xdb'
>>> u'한글' #wrong.. should be u'\ud55c\uae00' instead
u'\xc7\xd1\xb1\xdb'
>>> import sys;sys.getdefaultencoding()
'euc-kr'
>>> ^Z

Am I right that I assume the problem lies in the code.py(and therefore
in codeop.py)? To correct the problem, I seem to parse each string and
change the literal unicode object... Hmm... Sounds a bad approach.