[Python-checkins] CVS: python/dist/src/Misc unicode.txt,3.5,3.6

Thu, 13 Apr 2000 10:12:41 -0400

Update of /projects/cvsroot/python/dist/src/Misc
In directory seahag.cnri.reston.va.us:/home/fdrake/projects/python/Misc

Modified Files:
	unicode.txt 
Log Message:

M.-A. Lemburg <mal@lemburg.com>:

Updated to version 1.4.

Index: unicode.txt
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Misc/unicode.txt,v
retrieving revision 3.5
retrieving revision 3.6
diff -C2 -r3.5 -r3.6
*** unicode.txt	2000/04/10 19:45:09	3.5
--- unicode.txt	2000/04/13 14:12:38	3.6
***************
*** 1,4 ****
  =============================================================================
!  Python Unicode Integration                            Proposal Version: 1.3
  -----------------------------------------------------------------------------

--- 1,4 ----
  =============================================================================
!  Python Unicode Integration                            Proposal Version: 1.4
  -----------------------------------------------------------------------------

***************
*** 163,166 ****
--- 163,177 ----
  as their UTF-8 equivalent strings.

+ When compared using cmp() (or PyObject_Compare()) the implementation
+ should mask TypeErrors raised during the conversion to remain in synch
+ with the string behavior. All other errors such as ValueErrors raised
+ during coercion of strings to Unicode should not be masked and passed
+ through to the user.
+ 
+ In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
+ should be coerced to Unicode before applying the test. Errors occuring
+ during coercion (e.g. None in u'abc') should not be masked.
+ 
+ 
  Coercion:
  ---------
***************
*** 381,384 ****
--- 392,402 ----
          self.stream.write(data)

+     def writelines(self, list):
+ 
+         """ Writes the concatenated list of strings to the stream
+             using .write().
+         """
+         self.write(''.join(list))
+         
      def reset(self):

***************
*** 464,467 ****
--- 482,526 ----
                  return object

+     def readline(self, size=None):
+ 
+         """ Read one line from the input stream and return the
+             decoded data.
+ 
+             Note: Unlike the .readlines() method, this method inherits
+             the line breaking knowledge from the underlying stream's
+             .readline() method -- there is currently no support for
+             line breaking using the codec decoder due to lack of line
+             buffering. Sublcasses should however, if possible, try to
+             implement this method using their own knowledge of line
+             breaking.
+ 
+             size, if given, is passed as size argument to the stream's
+             .readline() method.
+             
+         """
+         if size is None:
+             line = self.stream.readline()
+         else:
+             line = self.stream.readline(size)
+         return self.decode(line)[0]
+ 
+     def readlines(self, sizehint=0):
+ 
+         """ Read all lines available on the input stream
+             and return them as list of lines.
+ 
+             Line breaks are implemented using the codec's decoder
+             method and are included in the list entries.
+             
+             sizehint, if given, is passed as size argument to the
+             stream's .read() method.
+ 
+         """
+         if sizehint is None:
+             data = self.stream.read()
+         else:
+             data = self.stream.read(sizehint)
+         return self.decode(data)[0].splitlines(1)
+ 
      def reset(self):

***************
*** 483,489 ****
          return getattr(self.stream,name)

- XXX What about .readline(), .readlines() ? These could be implemented
-     using .read() as generic functions instead of requiring their
-     implementation by all codecs. Also see Line Breaks.

  Stream codec implementors are free to combine the StreamWriter and
--- 542,545 ----
***************
*** 693,699 ****
  effect:

!   '%s':                 '%s' does str(u) for Unicode objects embedded
!                         in Python strings, so the output will be
!                         u.encode(<default encoding>)

  In case the format string is an Unicode object, all parameters are coerced
--- 749,756 ----
  effect:

!   '%s':                 For Unicode objects this will cause coercion of the
! 			whole format string to Unicode. Note that
! 			you should use a Unicode format string to start
! 			with for performance reasons.

  In case the format string is an Unicode object, all parameters are coerced
***************
*** 923,926 ****
--- 980,986 ----
  	http://www-4.ibm.com/software/developer/library/internationalization-support.html

+ IANA Character Set Names:
+ 	ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+ 
  Encodings:

***************
*** 945,948 ****
--- 1005,1014 ----
  History of this Proposal:
  -------------------------
+ 1.4: Added note about mixed type comparisons and contains tests.
+      Changed treating of Unicode objects in format strings (if used
+      with '%s' % u they will now cause the format string to be
+      coerced to Unicode, thus producing a Unicode object on return).
+      Added link to IANA charset names (thanks to Lars Marius Garshol).
+      Added new codec methods .readline(), .readlines() and .writelines().
  1.3: Added new "es" and "es#" parser markers
  1.2: Removed POD about codecs.open()