[Python-checkins] CVS: python/dist/src/Misc unicode.txt,3.2,3.3
Guido van Rossum
python-dev@python.org
Fri, 24 Mar 2000 17:14:21 -0500 (EST)
Update of /projects/cvsroot/python/dist/src/Misc
In directory eric:/home/guido/hp/mal/py-patched/Misc
Modified Files:
unicode.txt
Log Message:
Marc-Andre Lemburg:
Attached you find the latest update of the Unicode implementation.
The patch is against the current CVS version.
It includes the fix I posted yesterday for the core dump problem
in codecs.c (was introduced by my previous patch set -- sorry),
adds more tests for the codecs and two new parser markers
"es" and "es#".
Index: unicode.txt
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Misc/unicode.txt,v
retrieving revision 3.2
retrieving revision 3.3
diff -C2 -r3.2 -r3.3
*** unicode.txt 2000/03/20 16:36:35 3.2
--- unicode.txt 2000/03/24 22:14:19 3.3
***************
*** 716,735 ****
These markers are used by the PyArg_ParseTuple() APIs:
! 'U': Check for Unicode object and return a pointer to it
! 's': For Unicode objects: auto convert them to the <default encoding>
and return a pointer to the object's <defencstr> buffer.
! 's#': Access to the Unicode object via the bf_getreadbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not the Unicode string length (this may be different
depending on the Internal Format).
! 't#': Access to the Unicode object via the bf_getcharbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not necessarily to the Unicode string length (this may
be different depending on the <default encoding>).
File/Stream Output:
-------------------
--- 716,840 ----
These markers are used by the PyArg_ParseTuple() APIs:
! "U": Check for Unicode object and return a pointer to it
! "s": For Unicode objects: auto convert them to the <default encoding>
and return a pointer to the object's <defencstr> buffer.
! "s#": Access to the Unicode object via the bf_getreadbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not the Unicode string length (this may be different
depending on the Internal Format).
! "t#": Access to the Unicode object via the bf_getcharbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not necessarily to the Unicode string length (this may
be different depending on the <default encoding>).
+ "es":
+ Takes two parameters: encoding (const char *) and
+ buffer (char **).
+
+ The input object is first coerced to Unicode in the usual way
+ and then encoded into a string using the given encoding.
+
+ On output, a buffer of the needed size is allocated and
+ returned through *buffer as NULL-terminated string.
+ The encoded may not contain embedded NULL characters.
+ The caller is responsible for free()ing the allocated *buffer
+ after usage.
+
+ "es#":
+ Takes three parameters: encoding (const char *),
+ buffer (char **) and buffer_len (int *).
+
+ The input object is first coerced to Unicode in the usual way
+ and then encoded into a string using the given encoding.
+
+ If *buffer is non-NULL, *buffer_len must be set to sizeof(buffer)
+ on input. Output is then copied to *buffer.
+
+ If *buffer is NULL, a buffer of the needed size is
+ allocated and output copied into it. *buffer is then
+ updated to point to the allocated memory area. The caller
+ is responsible for free()ing *buffer after usage.
+
+ In both cases *buffer_len is updated to the number of
+ characters written (excluding the trailing NULL-byte).
+ The output buffer is assured to be NULL-terminated.
+
+ Examples:
+
+ Using "es#" with auto-allocation:
+
+ static PyObject *
+ test_parser(PyObject *self,
+ PyObject *args)
+ {
+ PyObject *str;
+ const char *encoding = "latin-1";
+ char *buffer = NULL;
+ int buffer_len = 0;
+
+ if (!PyArg_ParseTuple(args, "es#:test_parser",
+ encoding, &buffer, &buffer_len))
+ return NULL;
+ if (!buffer) {
+ PyErr_SetString(PyExc_SystemError,
+ "buffer is NULL");
+ return NULL;
+ }
+ str = PyString_FromStringAndSize(buffer, buffer_len);
+ free(buffer);
+ return str;
+ }
+
+ Using "es" with auto-allocation returning a NULL-terminated string:
+
+ static PyObject *
+ test_parser(PyObject *self,
+ PyObject *args)
+ {
+ PyObject *str;
+ const char *encoding = "latin-1";
+ char *buffer = NULL;
+
+ if (!PyArg_ParseTuple(args, "es:test_parser",
+ encoding, &buffer))
+ return NULL;
+ if (!buffer) {
+ PyErr_SetString(PyExc_SystemError,
+ "buffer is NULL");
+ return NULL;
+ }
+ str = PyString_FromString(buffer);
+ free(buffer);
+ return str;
+ }
+
+ Using "es#" with a pre-allocated buffer:
+
+ static PyObject *
+ test_parser(PyObject *self,
+ PyObject *args)
+ {
+ PyObject *str;
+ const char *encoding = "latin-1";
+ char _buffer[10];
+ char *buffer = _buffer;
+ int buffer_len = sizeof(_buffer);
+
+ if (!PyArg_ParseTuple(args, "es#:test_parser",
+ encoding, &buffer, &buffer_len))
+ return NULL;
+ if (!buffer) {
+ PyErr_SetString(PyExc_SystemError,
+ "buffer is NULL");
+ return NULL;
+ }
+ str = PyString_FromStringAndSize(buffer, buffer_len);
+ return str;
+ }
+
File/Stream Output:
-------------------
***************
*** 838,841 ****
--- 943,947 ----
History of this Proposal:
-------------------------
+ 1.3: Added new "es" and "es#" parser markers
1.2: Removed POD about codecs.open()
1.1: Added note about comparisons and hash values. Added note about