[Python-Dev] Encoding of PyFrameObject members

Xavier de Gaye xdegaye at gmail.com
Sat Feb 7 11:13:00 CET 2015


On 02/06/2015 11:48 PM, Francis Giraldeau wrote:
 > 2015-02-06 6:04 GMT-05:00 Armin Rigo:
 >
 >     Hi,
 >
 >     On 6 February 2015 at 08:24, Maciej Fijalkowski <fijall at gmail.com <mailto:fijall at gmail.com>> wrote:
 >     > I don't think it's safe to assume f_code is properly filled by the
 >     > time you might read it, depending a bit where you find the frame
 >     > object. Are you sure it's not full of garbage?
 >
 >
 >     Yes, before discussing how to do the utf8 decoding, we should realize
 >     that it is really unsafe code starting from the line before.  From a
 >     signal handler you're only supposed to read data that was written to
 >     "volatile" fields.  So even PyEval_GetFrame(), which is done by
 >     reading the thread state's "frame" field, is not safe: this is not a
 >     volatile.  This means that the compiler is free to do crazy things
 >     like *first* write into this field and *then* initialize the actual
 >     content of the frame.  The uninitialized content may be garbage, not
 >     just NULLs.
 >
 >
 > Thanks for these comments. Of course accessing frames withing a signal handler is racy. I confirm that code encoded in non-ascii is not accessible from the uft8 buffer pointer. However, a call
 > to PyUnicode_AsUTF8() encodes the data and caches it in the unicode object. Later access returns the byte buffer without memory allocation and re-encoding.
 >
 > I think it is possible to solve both safety problems by registering a handler with PyPyEval_SetProfile(). On function entry, the handler will call PyUnicode_AsUTF8() on the required frame members to
 > make sure the utf8 encoded string is available. Then, we increment the refcount of the frame and assign it to a thread local pointer. On function return, the refcount is decremented. These operations
 > occurs in the normal context and they are not racy. The signal handler will use the thread local frame pointer instead of calling PyEval_GetFrame(). Does that sounds good?


You could call Py_AddPendingCall() from your signal handler and access the
frame members from the function scheduled by Py_AddPendingCall().


Xavier


More information about the Python-Dev mailing list