[Python-ideas] improving C structs layout
Alfredo Solano
asolano at icai.es
Wed May 8 15:45:09 CEST 2013
Hi,
Interesting observation, but isn't C struct alignment platform/compiler
dependent? That would mean optimizing the member declaration order for
one architecture may have a performance hit for another one.
Alfredo
On 05/08/2013 02:39 PM, Charles-François Natali wrote:
> Hi,
>
> I was recently looking at the PyThreadState data structure (for issue
> #17912, but it's unimportant), and noticed that the layout of the
> members leaves some holes (due to alignment).
> While it doesn't import too much for PyThreadState (because of
> trailing padding), I wondered whether other structures in the code
> base could benefit from a better layout.
> So I ran pahole [1], and found the following structures:
>
> $ pahole -P python
> PyMemberDef 40 32 8
> wrapperbase 56 48 8
> unicode_formatter_t 136 128 8
> _expr 56 52 4
> _stmt 72 68 4
> _excepthandler 40 36 4
> _node 40 32 8
> compiler_unit 448 440 8
> tok_state 992 984 8
>
> The first column is the current size, and the second column the size
> after a more judicious layout.
>
> For example.
> Before:
> $ pahole -C wrapperbase python
> struct wrapperbase {
> char * name; /* 0 8 */
> int offset; /* 8 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> void * function; /* 16 8 */
> wrapperfunc wrapper; /* 24 8 */
> char * doc; /* 32 8 */
> int flags; /* 40 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> PyObject * name_strobj; /* 48 8 */
>
> /* size: 56, cachelines: 1, members: 7 */
> /* sum members: 48, holes: 2, sum holes: 8 */
> /* last cacheline: 56 bytes */
> };
>
> After:
> $ pahole -C wrapperbase -R python
> struct wrapperbase {
> char * name; /* 0 8 */
> int offset; /* 8 4 */
> int flags; /* 12 4 */
> void * function; /* 16 8 */
> wrapperfunc wrapper; /* 24 8 */
> char * doc; /* 32 8 */
> PyObject * name_strobj; /* 40 8 */
>
> /* size: 48, cachelines: 1, members: 7 */
> /* last cacheline: 48 bytes */
> }; /* saved 8 bytes! */
>
> While some of the structs above aren't worth the trouble (like
> tok_state), I think some might be interesting candidates.
> This could lead to reduced memory usage (well, of course it depends on
> the number of instances), and better cache usage/locality of
> reference.
>
> So what do you think, is it worth it?
>
> cf
>
> [1] https://github.com/acmel/dwarves
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
More information about the Python-ideas
mailing list