[ python-Bugs-1115379 ] Built-in compile function with PEP 0263 encoding bug

Wed Sep 28 06:29:28 CEST 2005

Bugs item #1115379, was opened at 2005-02-03 14:11
Message generated for change (Comment added) made by wigy
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1115379&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Parser/Compiler
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Christoph Zwerschke (cito)
Assigned to: Nobody/Anonymous (nobody)
Summary: Built-in compile function with PEP 0263 encoding bug

Initial Comment:
a = 'print "Hello, World"'
u = '# -*- coding: utf-8 -*-\n' + a

print compile(a, '<string>', 'exec') # ok
print compile(u, '<string>', 'exec') # ok
print compile(unicode(a), '<string>', 'exec') # ok
print compile(unicode(u), '<string>', 'exec') # error

# The last line gives a SystemError.
# Think this is a bug.

----------------------------------------------------------------------

Comment By: Vágvölgyi Attila (wigy)
Date: 2005-09-28 06:29

Message:
Logged In: YES 
user_id=156682

If this special case is a feature, not a bug, than it breaks
some symmetry for sure.

If I run a script having utf-8 encoding from a file with

  python script.py

then it has to have an encoding declaration. Now if I would
like to load the same file manually, decode it to a unicode
object, I also have to remove the encoding declaration at
the beginning of the file before I can give it to the
compile() function.

What special advantage comes from the fact that the compiler
does not simply ignore encoding declaration nodes from
unicode objects? Does this error message catch some possible
errors or does it make the compiler code simpler?

----------------------------------------------------------------------

Comment By: Vágvölgyi Attila (wigy)
Date: 2005-09-28 06:20

Message:
Logged In: YES 
user_id=156682

If this special case is a feature, not a bug, than it breaks
some symmetry for sure.

If I run a script having utf-8 encoding from a file with

  python script.py

then it has to have an encoding declaration. Now if I would
like to load the same file manually, decode it to a unicode
object, I also have to remove the encoding declaration at
the beginning of the file before I can give it to the
compile() function.

What special advantage comes from the fact that the compiler
does not simply ignore encoding declaration nodes from
unicode objects? Does this error message catch some possible
errors or does it make the compiler code simpler?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2005-02-10 01:37

Message:
Logged In: YES 
user_id=21627

There is a bug somewhere, certainly. However, I believe it
is in PEP 263, which should point out that unicode strings
in compile are only legal if they do *not* contain an
encoding declaration, as such strings are implicitly encoded
as UTF-8.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1115379&group_id=5470