[Python-ideas] A mutable alternative to namedtuple

Andrew Barnert abarnert at yahoo.com
Thu Mar 19 12:22:14 CET 2015


On Mar 19, 2015, at 3:43 AM, Zaur Shibzukhov <szport at gmail.com> wrote:
> 
> Yes. But:
> 
> First.
> 
> >>>print(sys.getsizeof(list([])), sys.getsizeof(tuple([])))
> 64 48
> print(sys.getsizeof(list([1,2])), sys.getsizeof(tuple([1,2])))
> 104 64
> 
> 
> Second.
> 
> Tuple object allocates it's memory 1 time, list object allocates it's memory 2 time.

As I explained, list leaves room for expansion, so it doesn't have to keep reallocating with every append. A 2-element list actually has room for... off the top of my head, I think 8 elements; then it multiplies the capacity every time it runs out on an append.

In a C extension, you can create a list with whatever specific capacity you want, but that isn't exposed to Python (presumably because such a micro-optimization is rarely useful, and would complicate the API, as well as being an attractive nuisance to people who insist on optimizing the wrong things). So, if you really needed this optimization, you could implement your namedlist in C.

But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient.

> That is why
> 
> >>>%time for i in range(100000): tuple(i for i in range(15))
> CPU times: user 241 ms, sys: 1.75 ms, total: 243 ms
> Wall time: 243 ms
> 
> >>>%time for i in range(100000): list(i for i in range(15))
> CPU times: user 318 ms, sys: 1.94 ms, total: 320 ms
> Wall time: 321 ms
> 
> Certainly this can have or not have a value depending on use case. 

If you're going to show benchmarks, you really should use %timeit rather than %time, and also understand that "i for i in range(15)" is just going to give you a slower but equivalent iterable to just using "range(15)" in the first place.

But more importantly, what use case are you considering where this extra 0.8us for construction of each row object will matter, but the difference between a namedtuple (or, presumably, namedlist) vs. a tuple won't? 

And, even if you did, you're focusing on something that accounts for at worst 25% of the construction time, and 0% of the time for all the actual work you do with the object after construction.

Again, unless you have many millions of these, neither the memory not the construction time is going to matter--and if you do, a new type that's more like tuple isn't going to be anywhere near sufficient to make a noticeable difference, because you're optimizing the wrong part of both the memory and the time.

> 
> ---
> Zaur Shibzukhov
> 
> 
> 2015-03-19 13:07 GMT+03:00 Andrew Barnert <abarnert at yahoo.com>:
>>> On Mar 19, 2015, at 2:11 AM, Zaur Shibzukhov <szport at gmail.com> wrote:
>>> 
>>> That all right. But I want to note that `collections.namedtuple` has several properties that make them exclusive:
>>> 
>>> 1. Fast creation;
>>> 2. Minimal memory capacity;
>>> 3. Fast sequence interface;
>>> 4. Attribute access to elements via properties.
>>> 
>>> Different namedtuple alternatives has different sets of properties that make them more ore less suitable depending on use cases.
>>> 
>>> So if someone search alternative of collections.namedtuple that support assignment too then it could be constructed on top of array (actually "tuple" but with assignment support, but python seems have not such array type).
>> 
>> Again, do you not know about list?
>> 
>>> 
>>> ---
>>> Zaur Shibzukhov
>>> 
>>> 
>>> 2015-03-19 11:37 GMT+03:00 Andrew Barnert <abarnert at yahoo.com>:
>>>>> On Mar 19, 2015, at 12:04 AM, Zaur Shibzukhov <szport at gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> вторник, 17 марта 2015 г., 20:21:01 UTC+3 пользователь Eric V. Smith написал:
>>>>>> 
>>>>>> On 03/17/2015 12:52 PM, Luciano Ramalho wrote: 
>>>>>> > Sometimes we need a simple class to hold some mutable attributes, 
>>>>>> > provide a nice repr, support == for testing, and support iterable 
>>>>>> > unpacking, so you can write: 
>>>>>> > 
>>>>>> >>>> p = Point(3, 4) 
>>>>>> >>>> x, y = p 
>>>>>> > 
>>>>>> > That's very much like the classes built by namedtuple, but mutable. 
>>>>>> 
>>>>>> https://pypi.python.org/pypi/namedlist 
>>>>>> 
>>>>>> It also adds default values to the generated constructor, which may or 
>>>>>> may not be desirable. But if used exactly like collections.namedtuple, 
>>>>>> it ignores the default values. 
>>>>>> 
>>>>>> Eric.
>>>>> Since named tuple is considered as an object that is a tuple with attribute access. 
>>>>> The mutable alternative could be considered as an array with attribute access.
>>>>> Array in this context is tuple-like object that support assign operation.
>>>>> Since python have not such object there are different approaches tomutable  named tuple alternatives.
>>>> 
>>>> Python definitely does have such an object: list. A list is effectively the same as a tuple but mutable; it's the paradigm MutableSequence while tuple is the paradigm Sequence. Under the covers they have very similar headers that both use the same storage (a C array of pointers to Python objects, in CPython), and C API functions like PySequence_Fast_GET_ITEM don't distinguish between the two.
>>>> 
>>>> However, list is resizable, and presumably a "namedlist" would not be. That makes things more complicated for both the interface (there's no is-a relationship; a type without append is not a list--and, worse, a type that has __setitem__ but can't handle slice replacement is not a list but that's very hard to detect...) and the implementation (e.g., a list reserves extra space at the end to avoid having to reallocate on every append). 
>>>> 
>>>> (Python _also_ has an array type, which is for homogenous simple types (like 32-bit int) which can store the values directly, as opposed to tuple and list, which store (pointers to) heterogenous normal Python objects.)
>>>> 
>>>>> One should note that particular property of named tuple is memory saving.
>>>>> So one can expect similar property of mutable named tuple too.
>>>> 
>>>> If you don't need to access the items by index for whatever reason, you don't need a namedtuple, and using one as a misguided misoptimization is a bad idea.
>>>> 
>>>> Besides the fact that a normal class with __slots__ is also small, and even a normal class with a dict (in newer CPython versions and PyPy) not that much bigger, besides the fact that you can eliminate the row overhead rather than just slightly reducing it by using, e.g., a 2D array, you're optimizing the wrong thing in the first place--if your rows have 9 elements, reducing the row overhead is focusing on fixing 10% of your overhead, while reducing or eliminating the element overhead by using, e.g., a 2D numpy array of low-level values fixes the 90% (along with the 10%).
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150319/3f1e1a52/attachment.html>


More information about the Python-ideas mailing list