[Python-bugs-list] [Bug #132850] unix line terminator on windows

noreply@sourceforge.net noreply@sourceforge.net
Sun, 18 Feb 2001 12:24:06 -0800


Bug #132850, was updated on 2001-Feb-17 10:45
Here is a current snapshot of the bug.

Project: Python
Category: Python Interpreter Core
Status: Closed
Resolution: Fixed
Bug Group: Platform-specific
Priority: 6
Submitted by: mpmak
Assigned to : tim_one
Summary: unix line terminator on windows

Details: 
Syntax/Name error when first script line is terminated only by \x0a - not
\x0d\x0a

this does totally nothing - every line terminated with \x0a
#
print '1 line'
print '2 line'

NameError error - name p is not defined
print '1 line'
print '2 line'

when only script has single line:
print '1 line'

SyntaxError but traceback is funny:
pprint '1 line'
              ^
SyntaxError: invalid syntax


Follow-Ups:

Date: 2001-Feb-18 12:24
By: tim_one

Comment:
Python scripts usually start on Unix with a line like

#! /usr/bin/env python

That way Unixoids can just say

$ myscript

at the command line instead of

$ python myscript

The "#!" is a Unixism that the OS understands.  Since it starts with #,
Python treats it as a comment.

Several other platforms support *similar* tricks, but they don't start with
#.  In that case -x is intended to be used, like starting your script
with

*&$%^ python -x %*

where "*&$%^" is whatever string of gibberish characters the platform uses
that mean the same thing as Unix "#!".

And that's the only thing -x is good for.  So if OpenVMS doesn't have
something like that, don't worry about -x.

WRT .pyc and .pyo files, yes,

$ python test_x.pyc

is *a* proper way to test that.  "-x" makes no sense at all when running
compiled bytecode, so I don't even care if that combination blows up.

The test above is too easy, though, because Python notices that the
filename ends with ".pyc", and skips all the hard work of *guessing*
whether the file passed to it is compiled bytecode.  So a better test is to
rename the bytecode file so Python can't recognize that it is bytecode just
from its name.  That's much trickier to get right across platforms, so that
would still be a valuable test to run.  Like (of course I don't know how to
spell this on your box):

$ copy test_x.pyc mystery
$ python mystery

Thanks!  In any case, the original bug appears fixed so I'm closing this
now.  If you still have a problem with something above, let's open a new
report so it says "OpenVMS" in the Summary line (we ran out of Windows
problems here).
-------------------------------------------------------

Date: 2001-Feb-18 03:40
By: zessin_5

Comment:
> zessin_5, then you must have rebroken -x under  OpenVMS. Yes?

Correct. However, I have never used '-x' on OpenVMS before.

> zessin_5, how does rev 2.123 work for you under OpenVMS?

It works properly, now. Thanks, Tim.

Test scripts:
TEST_X_OPTION_01LF.PY
print 'line 1, file in Stream_LF format'
print 'line 2'

TEST_X_OPTION_01RMS.PY
print 'line 1, file in record format'
print 'line 2'

TEST_X_OPTION_02LF.PY
line_1_to_be_ignored by -x option
print 'line 2, file in Stream_LF format'
print 'line 3'

TEST_X_OPTION_02RMS.PY
line_1_to_be_ignored by -x option
print 'line 2, file in record format'
print 'line 3'

Sample run:
VMS> ! <- this is the OS prompt I have set
VMS> set VERIFY
VMS> @ TEST_X_OPTION-RUN.COM;
$ set noON
$
$ python      test_x_option_01lf.py
line 1, file in Stream_LF format
line 2
$ python  -x  test_x_option_02lf.py
line 2, file in Stream_LF format
line 3
$ python      test_x_option_01rms.py
line 1, file in record format
line 2
$ python  -x  test_x_option_02rms.py
line 2, file in record format
line 3
$
$ exit 1
VMS>


I don't understand you comments from 2001-Feb-17 12:26
about execution of '.pyc/o' files.

Am I supposed to compile the scripts and then invoke them like:
  $ python test_x.pyc
or
  $ python -x test_x.pyc
??

-------------------------------------------------------

Date: 2001-Feb-17 16:17
By: tim_one

Comment:
Thought about that, but it won't fly:  the C std (whether C89 or C99)
guarantees only one character of pushback via ungetc; the typical case here
would have 2, and whether or not that works is again a platform-dependent
accident.
-------------------------------------------------------

Date: 2001-Feb-17 14:39
By: mpmak

Comment:

Inside NT works.
Thanks.

PS same effect without fseek/ftell:

int ispyc = 0;
int bytesread=fread(buf, 1, 2, fp);
ispyc = bytesread == 2 &&
  ((unsigned int)buf[1]<<8 | buf[0]) == halfmagic;
while( !ispyc && bytesread>0 ){
 ungetc(buf[--bytesread], fp);
}

-------------------------------------------------------

Date: 2001-Feb-17 14:09
By: tim_one

Comment:
Please try pythonrun.c rev 2.123.

Since fseek is a platform-dependent accident in text mode, I don't want to
use that at all anymore.

zessin_5, how does rev 2.123 work for you under OpenVMS?  I won't close
this bug for a few days pending your answer.
-------------------------------------------------------

Date: 2001-Feb-17 13:45
By: mpmak

Comment:

I have tested it on Windows NT 4.0,MSVC 6.0 SP4, with python from cvs
and:

-x - works as expected inside cmd files
*.py - are compiled/executed properly
*.pyc - python can execute pyc files too

linen umbers in buggy cmd/py files are shown corectly, in any case this
script fails when executing line 3:

#@python -x "%~f0" %* & goto :EOF
print 'ok'
makeanerror

-------------------------------------------------------

Date: 2001-Feb-17 13:31
By: tim_one

Comment:
zessin_5, then you must have rebroken -x under OpenVMS.  Yes?

-------------------------------------------------------

Date: 2001-Feb-17 13:27
By: tim_one

Comment:
Sorry, can't figure out what you think that code accomplishes.  Did you try
an example using -x?  Windows is the primary reason -x exists, so no hack
that leaves -x broken on Windows is acceptable.  The getc/ungetc business
is needed so that in *case* -x was specified (and without a change to the C
API, we have no way to know whether it was at this point), the \n that -x
ungetc'ed gets pushed back.  Else line numbers in tracebacks are off by one
under -x, and that's not acceptable either.
-------------------------------------------------------

Date: 2001-Feb-17 13:11
By: mpmak

Comment:

Brute force hack helps - at last for MS:

#ifdef _MSC_VER
		const long currentpos = ftell(fp);
		int ispyc = fread(buf, 1, 2, fp) == 2 &&
		        ((unsigned int)buf[1]<<8 | buf[0]) == halfmagic;
		fseek(fp, currentpos, SEEK_SET);
#else
... current CVS code goes here
#endif

-------------------------------------------------------

Date: 2001-Feb-17 12:56
By: zessin_5

Comment:
Additional info:

A change (in pythonrun.c) also broke (partly)
processing on OpenVMS.

The same effect (doubling the first character)
happens when the file is not in stream linefeed
format - otherwise it works as before.
Text editors, however, create files in record
format.

The problem is that the C library on OpenVMS
can only seek on record boundaries - I don't
want to go into too much detail, here.

I've restored the old code from version 2.1a2
in routine 'maybe_pyc_file()' in my copy and
it started working again.
-------------------------------------------------------

Date: 2001-Feb-17 12:47
By: tim_one

Comment:
Indeed, from stepping thru MS's ftell(), they do a great deal of expensive
fiddling for text-mode streams *assuming* that every \n in the stdio buffer
was originally an \r\n on disk.  When that isn't true, the adjustments they
make yield bizarre results.  We can't do anything about that.
-------------------------------------------------------

Date: 2001-Feb-17 12:26
By: tim_one

Comment:
I'm not sure this can be fixed with reasonable effort.

The patch that allowed .pyc files to get executed from the command-line,
and whether or not they have a .pyc/.pyo extension, broke the -x option
(skip first line), by rewinding the file to see whether it begins with the
right magic number.  That undid what -x does (i.e., skip over the first
line).

So I slopped in another hack to restore the file position in case (as is in
fact almost always the case) the .pyc magic-number hack didn't find what it
was looking for.

And there's the rub.  It *turns out* that, using MS's libraries, and
assuming FILE* fp is at the start of the text-mode stream, after

int ch = getc(fp);
long pos = ftell(fp);

then ch is reliably set to the first character in the file.  However, pos
is set to 1 if and only if the first line *ends* with \r\n.  If it ends
with plain \n, pos is left at 0.  This is bizarre and darned hard to
explain, but that is the way it works.

The .pyc hackery later fseek's back to pos and does ungetc(ch).  Since in
your case pos was set to 0, that ends up "stuttering" the first character
(ungetc('p') effectively puts an extra 'p' at the start of the file,
because pos was left at 0).

Since files with Unix line-ends are not proper text-mode files under
Windows, I doubt Microsoft would consider their behavior buggy here, and
neither would the C std (C guarantees very little about how text mode
works).

And I don't see any hope of fixing it short of either:

1. Opening .py files in binary mode and doing line-end translations
ourselves (a good idea, actually, but A Project).

or

2. Redoing the .pyc hack from scratch:  it should be done *before* -x
processing.  But that may require changing Python's C API.

In short, it's a mess.
-------------------------------------------------------

Date: 2001-Feb-17 11:18
By: tim_one

Comment:
Bizarre!  Assigned to me.  Worked OK in 2.0, and no idea how it could have
got broken.  Well, in truth Python never did anything to make this work, it
was simply that the MS stdio library delivered \n regardless of whether \n
or \r\n terminated a line.  That is, it was *nice* that it worked, but in
truth it was an accident.  Later.
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=132850&group_id=5470