[Python-checkins] python/dist/src/Lib codecs.py,1.35.2.6,1.35.2.7

Thu Apr 21 23:53:45 CEST 2005

Update of /cvsroot/python/python/dist/src/Lib
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv32723/Lib

Modified Files:
      Tag: release24-maint
	codecs.py 
Log Message:
Backport checkin (and the appropriate fix to the test):
If the data read from the bytestream in readline() ends in a '\r' read one more
byte, even if the user has passed a size parameter. This extra byte shouldn't
cause a buffer overflow in the tokenizer. The original plan was to return a line
ending in '\r', which might be recognizable as a complete line and skip any '\n'
that was read afterwards. Unfortunately this didn't work, as the tokenizer only
recognizes '\n' as line ends, which in turn lead to joined lines and
SyntaxErrors, so this special treatment of a split '\r\n' has been dropped. (It
can only happen with a temporarily exhausted bytestream now anyway.)
Fixes parts of SF bugs #1163244 and #1175396.


Index: codecs.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/codecs.py,v
retrieving revision 1.35.2.6
retrieving revision 1.35.2.7
diff -u -d -r1.35.2.6 -r1.35.2.7

--- codecs.py	4 Apr 2005 21:56:27 -0000	1.35.2.6
+++ codecs.py	21 Apr 2005 21:53:43 -0000	1.35.2.7
@@ -230,7 +230,6 @@
         self.errors = errors
         self.bytebuffer = ""
         self.charbuffer = u""
-        self.atcr = False
 
     def decode(self, input, errors='strict'):
         raise NotImplementedError
@@ -306,18 +305,12 @@
         # If size is given, we call read() only once
         while True:
             data = self.read(readsize)
-            if self.atcr and data.startswith(u"\n"):
-                data = data[1:]
             if data:
-                self.atcr = data.endswith(u"\r")
-                # If we're at a "\r" (and are allowed to read more), read one
-                # extra character (which might be a "\n") to get a proper
-                # line ending. (If the stream is temporarily exhausted we return
-                # the wrong line ending, but at least we won't generate a bogus
-                # second line.)
-                if self.atcr and size is None:
+                # If we're at a "\r" read one extra character (which might
+                # be a "\n") to get a proper line ending. If the stream is
+                # temporarily exhausted we return the wrong line ending.
+                if data.endswith(u"\r"):
                     data += self.read(size=1, chars=1)
-                    self.atcr = data.endswith(u"\r")
 
             line += data
             lines = line.splitlines(True)
@@ -367,7 +360,6 @@
         """
         self.bytebuffer = ""
         self.charbuffer = u""
-        self.atcr = False
 
     def seek(self, offset, whence=0):
         """ Set the input stream's current position.