[pypy-commit] cffi default: Document char16_t and char32_t

Fri Jun 2 03:19:42 EDT 2017

Author: Armin Rigo <arigo at tunes.org>
Branch: 
Changeset: r2963:c60281bf502f
Date: 2017-06-02 09:19 +0200
http://bitbucket.org/cffi/cffi/changeset/c60281bf502f/

Log:	Document char16_t and char32_t

diff --git a/doc/source/cdef.rst b/doc/source/cdef.rst
--- a/doc/source/cdef.rst
+++ b/doc/source/cdef.rst
@@ -178,7 +178,8 @@
 * intN_t, uintN_t (for N=8,16,32,64), intptr_t, uintptr_t, ptrdiff_t,
   size_t, ssize_t
 
-* wchar_t (if supported by the backend)
+* wchar_t (if supported by the backend).  *New in version 1.11:*
+  char16_t and char32_t.
 
 * _Bool and bool (equivalent).  If not directly supported by the C
   compiler, this is declared with the size of ``unsigned char``.
diff --git a/doc/source/ref.rst b/doc/source/ref.rst
--- a/doc/source/ref.rst
+++ b/doc/source/ref.rst
@@ -104,11 +104,13 @@
   returns a ``bytes``, not a ``str``.
 
 - If 'cdata' is a pointer or array of wchar_t, returns a unicode string
-  following the same rules.
+  following the same rules.  *New in version 1.11:* can also be
+  char16_t or char32_t.
 
-- If 'cdata' is a single character or byte or a wchar_t, returns it as a
-  byte string or unicode string.  (Note that in some situation a single
-  wchar_t may require a Python unicode string of length 2.)
+- If 'cdata' is a single character or byte or a wchar_t or charN_t,
+  returns it as a byte string or unicode string.  (Note that in some
+  situation a single wchar_t or char32_t may require a Python unicode
+  string of length 2.)
 
 - If 'cdata' is an enum, returns the value of the enumerator as a string.
   If the value is out of range, it is simply returned as the stringified
@@ -125,7 +127,7 @@
 
 - If 'cdata' is a pointer to 'wchar_t', returns a unicode string.
   ('length' is measured in number of wchar_t; it is not the size in
-  bytes.)
+  bytes.)  *New in version 1.11:* can also be char16_t or char32_t.
 
 - If 'cdata' is a pointer to anything else, returns a list, of the
   given 'length'.  (A slower way to do that is ``[cdata[i] for i in
@@ -626,10 +628,10 @@
 |   ``char``    | a string of length 1   | a string of      | int(), bool(), |
 |               | or another <cdata char>| length 1         | ``<``          |
 +---------------+------------------------+------------------+----------------+
-|  ``wchar_t``  | a unicode of length 1  | a unicode of     |                |
-|               | (or maybe 2 if         | length 1         | int(), bool(), |
-|               | surrogates) or         | (or maybe 2 if   | ``<``          |
-|               | another <cdata wchar_t>| surrogates)      |                |
+| ``wchar_t``,  | a unicode of length 1  | a unicode of     |                |
+| ``char16_t``, | (or maybe 2 if         | length 1         | int(), bool(), |
+| ``char32_t``  | surrogates) or         | (or maybe 2 if   | ``<``          |
+|               | another similar <cdata>| surrogates)      |                |
 +---------------+------------------------+------------------+----------------+
 |  ``float``,   | a float or anything on | a Python float   | float(), int(),|
 |  ``double``   | which float() works    |                  | bool(), ``<``  |
@@ -671,9 +673,9 @@
 | ``char[]``,   |                        |                  | ``-``          |
 | ``_Bool[]``   |                        |                  |                |
 +---------------+------------------------+                  +----------------+
-| ``wchar_t[]`` | same as arrays, or a   |                  | len(), iter(), |
-|               | Python unicode string  |                  | ``[]``,        |
-|               |                        |                  | ``+``, ``-``   |
+|``wchar_t[]``, | same as arrays, or a   |                  | len(), iter(), |
+|``char16_t[]``,| Python unicode string  |                  | ``[]``,        |
+|``char32_t[]`` |                        |                  | ``+``, ``-``   |
 |               |                        |                  |                |
 +---------------+------------------------+------------------+----------------+
 | structure     | a list or tuple or     | a <cdata>        | read/write     |
diff --git a/doc/source/using.rst b/doc/source/using.rst
--- a/doc/source/using.rst
+++ b/doc/source/using.rst
@@ -25,6 +25,11 @@
 unicode string to an integer, ``ord(x)`` does not work; use instead
 ``int(ffi.cast('wchar_t', x))``.
 
+*New in version 1.11:* in addition to ``wchar_t``, the C types
+``char16_t`` and ``char32_t`` work the same but with a known fixed size.
+In previous versions, this could be achieved using ``uint16_t`` and
+``int32_t`` but without automatic convertion to Python unicodes.
+
 Pointers, structures and arrays are more complex: they don't have an
 obvious Python equivalent.  Thus, they correspond to objects of type
 ``cdata``, which are printed for example as
@@ -197,9 +202,10 @@
     >>> ffi.string(x) # interpret 'x' as a regular null-terminated string
     'Hello'
 
-Similarly, arrays of wchar_t can be initialized from a unicode string,
+Similarly, arrays of wchar_t or char16_t or char32_t can be initialized
+from a unicode string,
 and calling ``ffi.string()`` on the cdata object returns the current unicode
-string stored in the wchar_t array (adding surrogates if necessary).
+string stored in the source array (adding surrogates if necessary).
 
 Note that unlike Python lists or tuples, but like C, you *cannot* index in
 a C array from the end using negative numbers.
@@ -347,7 +353,8 @@
 
     assert lib.strlen("hello") == 5
 
-You can also pass unicode strings as ``wchar_t *`` arguments.  Note that
+You can also pass unicode strings as ``wchar_t *`` or ``char16_t *`` or
+``char32_t *`` arguments.  Note that
 the C language makes no difference between argument declarations that
 use ``type *`` or ``type[]``.  For example, ``int *`` is fully
 equivalent to ``int[]`` (or even ``int[5]``; the 5 is ignored).  For CFFI,
diff --git a/doc/source/whatsnew.rst b/doc/source/whatsnew.rst
--- a/doc/source/whatsnew.rst
+++ b/doc/source/whatsnew.rst
@@ -6,6 +6,13 @@
 v1.11
 =====
 
+* Support the modern standard types ``char16_t`` and ``char32_t``.
+  These work like ``wchar_t``: they represent one unicode character, or
+  when used as ``charN_t *`` or ``charN_t[]`` they represent a unicode
+  string.  The difference with ``wchar_t`` is that they have a known,
+  fixed size.  They should work at all places that used to work with
+  ``wchar_t`` (please report an issue if I missing something).
+
 * Support the C99 types ``float _Complex`` and ``double _Complex``.
   Note that libffi doesn't support them, which means that in the ABI
   mode you still cannot call C functions that take complex numbers