[Python-checkins] CVS: python/dist/src/Misc unicode.txt,3.5,3.6
Fred Drake
python-dev@python.org
Thu, 13 Apr 2000 10:12:41 -0400
Update of /projects/cvsroot/python/dist/src/Misc
In directory seahag.cnri.reston.va.us:/home/fdrake/projects/python/Misc
Modified Files:
unicode.txt
Log Message:
M.-A. Lemburg <mal@lemburg.com>:
Updated to version 1.4.
Index: unicode.txt
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Misc/unicode.txt,v
retrieving revision 3.5
retrieving revision 3.6
diff -C2 -r3.5 -r3.6
*** unicode.txt 2000/04/10 19:45:09 3.5
--- unicode.txt 2000/04/13 14:12:38 3.6
***************
*** 1,4 ****
=============================================================================
! Python Unicode Integration Proposal Version: 1.3
-----------------------------------------------------------------------------
--- 1,4 ----
=============================================================================
! Python Unicode Integration Proposal Version: 1.4
-----------------------------------------------------------------------------
***************
*** 163,166 ****
--- 163,177 ----
as their UTF-8 equivalent strings.
+ When compared using cmp() (or PyObject_Compare()) the implementation
+ should mask TypeErrors raised during the conversion to remain in synch
+ with the string behavior. All other errors such as ValueErrors raised
+ during coercion of strings to Unicode should not be masked and passed
+ through to the user.
+
+ In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
+ should be coerced to Unicode before applying the test. Errors occuring
+ during coercion (e.g. None in u'abc') should not be masked.
+
+
Coercion:
---------
***************
*** 381,384 ****
--- 392,402 ----
self.stream.write(data)
+ def writelines(self, list):
+
+ """ Writes the concatenated list of strings to the stream
+ using .write().
+ """
+ self.write(''.join(list))
+
def reset(self):
***************
*** 464,467 ****
--- 482,526 ----
return object
+ def readline(self, size=None):
+
+ """ Read one line from the input stream and return the
+ decoded data.
+
+ Note: Unlike the .readlines() method, this method inherits
+ the line breaking knowledge from the underlying stream's
+ .readline() method -- there is currently no support for
+ line breaking using the codec decoder due to lack of line
+ buffering. Sublcasses should however, if possible, try to
+ implement this method using their own knowledge of line
+ breaking.
+
+ size, if given, is passed as size argument to the stream's
+ .readline() method.
+
+ """
+ if size is None:
+ line = self.stream.readline()
+ else:
+ line = self.stream.readline(size)
+ return self.decode(line)[0]
+
+ def readlines(self, sizehint=0):
+
+ """ Read all lines available on the input stream
+ and return them as list of lines.
+
+ Line breaks are implemented using the codec's decoder
+ method and are included in the list entries.
+
+ sizehint, if given, is passed as size argument to the
+ stream's .read() method.
+
+ """
+ if sizehint is None:
+ data = self.stream.read()
+ else:
+ data = self.stream.read(sizehint)
+ return self.decode(data)[0].splitlines(1)
+
def reset(self):
***************
*** 483,489 ****
return getattr(self.stream,name)
- XXX What about .readline(), .readlines() ? These could be implemented
- using .read() as generic functions instead of requiring their
- implementation by all codecs. Also see Line Breaks.
Stream codec implementors are free to combine the StreamWriter and
--- 542,545 ----
***************
*** 693,699 ****
effect:
! '%s': '%s' does str(u) for Unicode objects embedded
! in Python strings, so the output will be
! u.encode(<default encoding>)
In case the format string is an Unicode object, all parameters are coerced
--- 749,756 ----
effect:
! '%s': For Unicode objects this will cause coercion of the
! whole format string to Unicode. Note that
! you should use a Unicode format string to start
! with for performance reasons.
In case the format string is an Unicode object, all parameters are coerced
***************
*** 923,926 ****
--- 980,986 ----
http://www-4.ibm.com/software/developer/library/internationalization-support.html
+ IANA Character Set Names:
+ ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+
Encodings:
***************
*** 945,948 ****
--- 1005,1014 ----
History of this Proposal:
-------------------------
+ 1.4: Added note about mixed type comparisons and contains tests.
+ Changed treating of Unicode objects in format strings (if used
+ with '%s' % u they will now cause the format string to be
+ coerced to Unicode, thus producing a Unicode object on return).
+ Added link to IANA charset names (thanks to Lars Marius Garshol).
+ Added new codec methods .readline(), .readlines() and .writelines().
1.3: Added new "es" and "es#" parser markers
1.2: Removed POD about codecs.open()