[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.
Serhiy Storchaka
report at bugs.python.org
Thu Dec 19 04:43:13 EST 2019
Serhiy Storchaka <storchaka+cpython at gmail.com> added the comment:
Do you mean some concrete code? Several times I wished similar feature. To get a UTF-8 cache if it exists and encode to UTF-8 without creating a cache otherwise.
The private _PyUnicode_UTF8() macro could help
if ((s = _PyUnicode_UTF8(str))) {
size = _PyUnicode_UTF8_LENGTH(str);
tmpbytes = NULL;
}
else {
tmpbytes = _PyUnicode_AsUTF8String(str, "replace");
s = PyBytes_AS_STRING(tmpbytes);
size = PyBytes_GET_SIZE(tmpbytes);
}
but it is not even available outside of unicodeobject.c.
PyUnicode_BorrowUTF8() looks too complex for the public API. I am not sure that it will be easy to implement it in PyPy. It also does not cover all use cases -- sometimes you want to convert to UTF-8 but does not use any memory allocation at all (either use an existing buffer or raise an error if there is no cached UTF-8 or the string is not ASCII).
----------
nosy: +serhiy.storchaka
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue39087>
_______________________________________
More information about the Python-bugs-list
mailing list