This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Syntax error on large file with MBCS encoding
Type: Stage:
Components: Interpreter Core Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mhammond Nosy List: doerwalter, jkew, mhammond, nikis, sdahlbac, tilinna, tim.peters, tnleeuw, tzot
Priority: high Keywords:

Created on 2005-03-14 20:20 by tnleeuw, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
00020905-0000-0000-C000-000000000046x0x8x1.zip tnleeuw, 2005-03-14 20:20 zipped Python source that gives compilation error with 2.4.1rc1, but not wiht 2.3.5
foo2.py nobody, 2005-03-21 12:09 Simple repro that does no imports
Messages (14)
msg24589 - (view) Author: Tim N. van der Leeuw (tnleeuw) Date: 2005-03-14 20:20
Large files generated by make-py.py from the win32all
extensions cannot be compiled by Python2.4.1rc1 - they
give a syntax error.

This is a regression from 2.3.5

(With Python2.4, the interpreter crashes. That is fixed
now.)

Removing the mbcs encoding line from the top of the
file, compilation succeeds.

File should be attached, as zip-file. Probably requires
win32all extensions to be installed to be compiled /
imported (generated using build 203 of the win32all
extensions).
msg24590 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2005-03-20 10:28
Logged In: YES 
user_id=539787

Useful pointers: in Python-dev, this has been characterised
as related to pywin32 bug 1085454.  Also related to
www.python.org/sf/1101726 and www.python.org/sf/1089395.
msg24591 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2005-03-21 12:11
Logged In: YES 
user_id=14198

I believe this is a different bug than the recent
"long-lines" errors (see below).  I can reproduce this with
a file that uses neither long lines, nor any pywin32
extensions (2.4 branch, trunk)

A Python source file containing:
-- start snippet --
# -*- coding: mbcs -*-
<1532 characters of code or comments>
<cr/lf newline>
x = {}
-- end snippet --

Will yield a SyntaxError when attempting to import the
module.  Running the module as a script does not provoke the
error.
    
To reproduce, there must be exactly 1532 characters where
specified (see the attached file for a demo).  Adding or
removing even a single character will prevent the error.  It
is possible to replace characters with any others, including
valid code, and still see the error - however, the number of
characters must remain the same .cr/lf pairs can also be
replaced with any other 2 characters.  There are other
"block sizes" that will provoke the error, but this is the
only one I have nailed.
    
Apart from the "block" of 1532 characters, the coding line
and the blank line before the dict assignment also appear
critical.  Unlike the other characters in the block, this
last cr/lf pair can not be replaced with comments.  I can't
provoke the error with other encodings (note there are no
encoded characters in the sample - it is trivial).

To reproduce, save the attached file on Windows and execute:
> python -c "import foo2"
Traceback (most recent call last):
  File "<string>", line 1, in ?
  File "foo2.py", line 24
x = {}
    ^
SyntaxError: invalid syntax

Note that Python 2.3 and earlier all work.  Also note that
"python foo2.py" also works.  The code is clearly valid.
    
Haven't tried to repro on Linux (mbcs isn't available there,
and I can't get a test case that doesn't use it)

Other pointers/notes: pywin32 bug 1085454 is related to
long-lines, by all accounts that underlying error has been
fixed - I can't verify this as pywin32 no longer generates
insanely long lines.  I can confirm Python bugs
1101726/1089395 still crashes Python 2.3+.  I believe all
(including this) are discrete bugs.

[foo2.py is my attachment - ya gotta love sourceforge :)]
msg24592 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2005-03-21 13:34
Logged In: YES 
user_id=539787

Could be irrelevant but... are the other block sizes close
to n*512 (eg 1536 is 3*512) marks?
msg24593 - (view) Author: Timo Linna (tilinna) Date: 2005-04-09 08:09
Logged In: YES 
user_id=1074183

Seems that the connection to n*512 blocks is very likely,
and it's not just MBCS-related. I managed to reproduce this
with a file that contains an ascii-coding declaration,
close-to-1024 bytes section, extra crlf and a comment which
raises a SyntaxError in Py2.4.1.

Could this be linked to the new codec buffering code? See:
www.python.org/sf/1178484
msg24594 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-04-14 23:40
Logged In: YES 
user_id=89016

Importing foo2.py on Linux (with the current CVS HEAD
version of Python) gives me a segmentation fault with the
following stacktrace:
0x080606cc in instance_repr (inst=0xb7c158bc) at
Objects/classobject.c:880
880                     classname = inst->in_class->cl_name;
(gdb) bt
#0  0x080606cc in instance_repr (inst=0xb7c158bc) at
Objects/classobject.c:880
#1  0x08082235 in PyObject_Repr (v=0xb7c158bc) at
Objects/object.c:308
#2  0x080f3ccd in err_input (err=0xbfffe000) at
Python/pythonrun.c:1478
#3  0x080f3956 in PyParser_SimpleParseFileFlags
(fp=0x818d6e0, filename=0xbfffe530 "foo2.py", start=257,
flags=0)
    at Python/pythonrun.c:1348
#4  0x080f3982 in PyParser_SimpleParseFile (fp=0x818d6e0,
filename=0xbfffe530 "foo2.py", start=257)
    at Python/pythonrun.c:1355
#5  0x080e6fef in parse_source_module (pathname=0xbfffe530
"foo2.py", fp=0x818d6e0) at Python/import.c:761
#6  0x080e72db in load_source_module (name=0xbfffe9d0
"foo2", pathname=0xbfffe530 "foo2.py", fp=0x818d6e0)
    at Python/import.c:885
#7  0x080e86b4 in load_module (name=0xbfffe9d0 "foo2",
fp=0x818d6e0, buf=0xbfffe530 "foo2.py", type=1, loader=0x0)
    at Python/import.c:1656
#8  0x080e9d52 in import_submodule (mod=0x8145768,
subname=0xbfffe9d0 "foo2", fullname=0xbfffe9d0 "foo2")
    at Python/import.c:2250
#9  0x080e9511 in load_next (mod=0x8145768,
altmod=0x8145768, p_name=0xbfffedf0, buf=0xbfffe9d0 "foo2",
p_buflen=0xbfffe9cc)
    at Python/import.c:2070
#10 0x080e8e5e in import_module_ex (name=0x0,
globals=0xb7d62e94, locals=0xb7d62e94, fromlist=0x8145768)
    at Python/import.c:1905
#11 0x080e914b in PyImport_ImportModuleEx (name=0xb7cd8824
"foo2", globals=0xb7d62e94, locals=0xb7d62e94, 
    fromlist=0x8145768) at Python/import.c:1946
#12 0x080b5c87 in builtin___import__ (self=0x0,
args=0xb7d1e634) at Python/bltinmodule.c:45
#13 0x0811d32e in PyCFunction_Call (func=0xb7d523ec,
arg=0xb7d1e634, kw=0x0) at Objects/methodobject.c:73
#14 0x0805d188 in PyObject_Call (func=0xb7d523ec,
arg=0xb7d1e634, kw=0x0) at Objects/abstract.c:1757
#15 0x080ca79d in PyEval_CallObjectWithKeywords
(func=0xb7d523ec, arg=0xb7d1e634, kw=0x0) at Python/ceval.c:3425
#16 0x080c6719 in PyEval_EvalFrame (f=0x816dd7c) at
Python/ceval.c:2026
#17 0x080c8fdd in PyEval_EvalCodeEx (co=0xb7cf1ef0,
globals=0xb7d62e94, locals=0xb7d62e94, args=0x0, argcount=0,
kws=0x0, 
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at
Python/ceval.c:2736
#18 0x080bffb0 in PyEval_EvalCode (co=0xb7cf1ef0,
globals=0xb7d62e94, locals=0xb7d62e94) at Python/ceval.c:490
#19 0x080f361d in run_node (n=0xb7d122d0, filename=0x8123ba3
"<stdin>", globals=0xb7d62e94, locals=0xb7d62e94, 
    flags=0xbffff584) at Python/pythonrun.c:1265
#20 0x080f1f58 in PyRun_InteractiveOneFlags (fp=0xb7e94720,
filename=0x8123ba3 "<stdin>", flags=0xbffff584)
    at Python/pythonrun.c:762
#21 0x080f1c93 in PyRun_InteractiveLoopFlags (fp=0xb7e94720,
filename=0x8123ba3 "<stdin>", flags=0xbffff584)
    at Python/pythonrun.c:695
#22 0x080f1af6 in PyRun_AnyFileExFlags (fp=0xb7e94720,
filename=0x8123ba3 "<stdin>", closeit=0, flags=0xbffff584)
    at Python/pythonrun.c:658
#23 0x08055e45 in Py_Main (argc=1, argv=0xbffff634) at
Modules/main.c:484
#24 0x08055366 in main (argc=1, argv=0xbffff634) at
Modules/python.c:23

The value object in err_input() (in the E_DECODE case) seems
to be bogus (it gives me a refcount of -606348325).
msg24595 - (view) Author: Niki Spahiev (nikis) Date: 2005-06-02 16:11
Logged In: YES 
user_id=27708

i have reproductable test case with encoding cp1251
file is 1594 bytes long, how to attach it?
msg24596 - (view) Author: Simon Dahlbacka (sdahlbac) Date: 2005-07-21 10:38
Logged In: YES 
user_id=750513

For what it's worth:

I have two files (that I unfortunately cannot attach) which
works fine on 2.3 that now on 2.4.1 produces spurious syntax
errors when they have

# -*- coding: ascii -*-

if I change that to something that does not match the coding
regex I do not get any syntax error

(winxp)
msg24597 - (view) Author: James Kew (jkew) Date: 2005-08-04 06:56
Logged In: YES 
user_id=598066

Is pywin32 bug 1166627 relevant/related?
msg24598 - (view) Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) * Date: 2005-08-04 07:50
Logged In: YES 
user_id=539787

Are you sure about the bug number? pywin32 seems not to have
such a bug.
msg24599 - (view) Author: James Kew (jkew) Date: 2005-08-04 17:10
Logged In: YES 
user_id=598066

http://sourceforge.net/tracker/?
func=detail&aid=1166627&group_id=78018&atid=551954
msg24600 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2005-11-11 03:31
Logged In: YES 
user_id=31435

Is this still an issue in 2.4.2?  I downloaded the zip file, and 
didn't have any problem importing the .py file on Windows 
using 2.4.2.  A number of problems with encoding directives 
were fixed in 2.4.2, so I doubt that's an accident ;-)
msg24601 - (view) Author: Mark Hammond (mhammond) * (Python committer) Date: 2005-11-11 05:20
Logged In: YES 
user_id=14198

Thanks Tim!  I can confirm that I can no longer reproduce it
with the svn release24-maint branch - so I'm going out on a
limb and closing it.  I haven't tested Linux, so it would be
great of some others could also confirm it fixed (or reopen
it if not)
msg24602 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2005-11-11 14:56
Logged In: YES 
user_id=31435

[Mark]
> I can confirm that I can no longer reproduce it
> with the svn release24-maint branch

Did you know 2.4.2 final was released?  That happened 
September 28.  So if someone has this problem, ask them to 
try the released 2.4.2 (no need to muck with release24-
maint).

Leaving this closed, but assigned to Mark just so he'll get 
this note.
History
Date User Action Args
2022-04-11 14:56:10adminsetgithub: 41697
2005-03-14 20:20:03tnleeuwcreate