Anyway to designating the encoding of the "source" for compile?

janeaustine50 at hotmail.com janeaustine50 at hotmail.com
Mon May 16 19:44:30 EDT 2005


janeaustin... at hotmail.com wrote:
> John Machin 작성:
> > On 16 May 2005 10:15:22 -0700, janeaustine50 at hotmail.com wrote:
> >
> > >janeaustine50 at hotmail.com wrote:
> > >> Python's InteractiveInterpreter uses the built-in compile
> function.
> > >>
> > >> According to the ref. manual, it doesn't seem to concern about
the
> > >> encoding of the source string.
> > >>
> > >> When I hand in an unicode object, it is encoded in utf-8
> > >automatically.
> > >> It can be a problem when I'm building an interactive environment
> > >using
> > >> "compile", with a different encoding from utf-8.
> >
> > I don't understand this. Suppose your "different encoding" is
cp125x
> > (where x is a digit). Would you not do something like this?
> >
> > compile_input = user_input.decode('cp125x')
> > code_object = compile(compile_input, ......
> >
> >
> > >> IDLE itself has the
> > >> same problem. ( '<a string with non-ascii-encoding>' is treated
> okay
> > >> but u'<a string with non-ascii-encoding>' is treated wrong.)
> > >>
> > >> Any suggestions or any plans in future python versions?
> > >
> > >I've read a posting from Martin Von Loewis mentioning trying to
> build
> > >in that feature(optionally marking encoding when calling
"compile").
> > >Anyone knows how it is going on?
> >
> > Firstly, it would help those who might be trying to help you if you
> > could post a simple example: input, output, what error message,
what
> > you mean by 'is treated wrong' ... and when it comes to Unicode
> > objects (indeed any text), show us repr(text) -- "what you see is
> > often not what others see and often not what you've actually got".
> >
> > Secondly, are any of the contents of PEP 263 of any use to you?
> > http://www.python.org/peps/pep-0263.html
>
>
> Okay, I'll use one of the CJK codecs as the example. EUC-KR is the
> default encoding.
>
> >>> import sys;sys.getdefaultencoding()
> 'euc-kr'
> >>> '한글'
> '\xc7\xd1\xb1\xdb'
> >>> u'한글'
> u'\ud55c\uae00'
> >>> s=compile("inside=u'한글'",'','single')
> >>> exec s
> >>> inside #wrong
> u'\xc7\xd1\xb1\xdb'
> >>> s=compile(u"inside=u'한글'",'','single')
> >>> exec s
> >>> inside #correct
> u'\ud55c\uae00'
>
> So I reckon that the "compile" should get a unicode object.
However...
>
> C:\Python24\Lib>python code.py
> > <string>(1)?()
> (Pdb) c
> Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)]
on
> win32
> Type "help", "copyright", "credits" or "license" for more
information.
> (InteractiveConsole)
> >>> '한글'
> '\xc7\xd1\xb1\xdb'
> >>> u'한글' #wrong.. should be u'\ud55c\uae00' instead
> u'\xc7\xd1\xb1\xdb'
> >>> import sys;sys.getdefaultencoding()
> 'euc-kr'
> >>> ^Z
>
> Am I right that I assume the problem lies in the code.py(and
therefore
> in codeop.py)? To correct the problem, I seem to parse each string
and
> change the literal unicode object... Hmm... Sounds a bad approach.

Oh, I forgot one more thing.

C:\Python24\Lib>python
Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s=compile(u"'한글'",'','single')
>>> exec s #wrong. the result is encoded in utf-8 instead of euc-kr
'\xed\x95\x9c\xea\xb8\x80'
>>> s=compile(u"u'한글'",'','single')
>>> exec s #correct
u'\ud55c\uae00'
>>>




More information about the Python-list mailing list