[Python-ideas] A mutable alternative to namedtuple

Zaur Shibzukhov szport at gmail.com
Thu Mar 19 18:57:38 CET 2015


2015-03-19 14:22 GMT+03:00 Andrew Barnert <abarnert at yahoo.com>:

> On Mar 19, 2015, at 3:43 AM, Zaur Shibzukhov <szport at gmail.com> wrote:
>
> Yes. But:
>
> First.
>
> >>>print(sys.getsizeof(list([])), sys.getsizeof(tuple([])))
> 64 48
> print(sys.getsizeof(list([1,2])), sys.getsizeof(tuple([1,2])))
> 104 64
>
>
> Second.
>
> Tuple object allocates it's memory 1 time, list object allocates it's
> memory 2 time.
>
>
> As I explained, list leaves room for expansion, so it doesn't have to keep
> reallocating with every append. A 2-element list actually has room for...
> off the top of my head, I think 8 elements; then it multiplies the capacity
> every time it runs out on an append.
>

>>>for i in [4, 12, 28, 68]: print(sys.getsizeof(list(range(i))), end=' ')
120 216 360 720

> In a C extension, you can create a list with whatever specific capacity
you want, but that isn't exposed to Python
> (presumably because such a micro-optimization is rarely useful, and would
complicate the API, as well as being an
> attractive nuisance to people who insist on optimizing the wrong things).
So, if you really needed this optimization, you could
> implement your namedlist in C.

Yes you right. Actual optimization one could archive with the help of
Cython if necessary.

But what use case are you imagining where the extra 40 bytes per row
matter, but the 64 bytes per row don't, and neither do the 56 or so bytes
per element per row? I'm having a hard time imagining any case where using
tuples would be necessary where it wouldn't be woefully insufficient.


That is why

>>>%time for i in range(100000): tuple(i for i in range(15))
CPU times: user 241 ms, sys: 1.75 ms, total: 243 ms
Wall time: 243 ms

>>>%time for i in range(100000): list(i for i in range(15))
CPU times: user 318 ms, sys: 1.94 ms, total: 320 ms
Wall time: 321 ms

Certainly this can have or not have a value depending on use case.
>>>import namedlist

> If you're going to show benchmarks, you really should use %timeit rather
than %time, and also understand that "i for i in
> range(15)" is just going to give you a slower but equivalent iterable to
just using "range(15)" in the first place.

> But more importantly, what use case are you considering where this extra
0.8us for construction of each row object will
> matter, but the difference between a namedtuple (or, presumably,
namedlist) vs. a tuple won't?

>>>class Point:
    __slots__ = 'x', 'y', 'z'
    def __init__(self, x, y, z):
        self.x, self.y, self.z = x, y, z

>>>asizeof.asizeof(Point(1,2,3))
   232
>>>import collections
>>>Point2 = collections.namedtuple('Point', 'x y z')
>>>Point3 = namedlist.namedtuple('Point', 'x y z')
>>>Point4 = namedlist.namedlist('Point', 'x y z')
>>>asizeof.asizeof(Point2(1,2,3))
   120
>>>asizeof.asizeof(Point3(1,2,3))
   120
>>>asizeof.asizeof(Point4(1,2,3))
   1704

> And, even if you did, you're focusing on something that accounts for at
worst 25% of the construction time, and 0% of the
> time for all the actual work you do with the object after construction.

> Again, unless you have many millions of these, neither the memory not the
construction time is going to matter--and if you do,
> a new type that's more like tuple isn't going to be anywhere near
sufficient to make a noticeable difference, because you're
> optimizing the wrong part of both the memory and the time.

Possibly basic gain in the savings of memory, especially in the context of
long running processes, limitations on the memory on hostings, long loop +
memory fragmentation in python heap space, ...


---
*Zaur Shibzukhov*


2015-03-19 13:07 GMT+03:00 Andrew Barnert <abarnert at yahoo.com>:

> On Mar 19, 2015, at 2:11 AM, Zaur Shibzukhov <szport at gmail.com> wrote:
>
> That all right. But I want to note that `collections.namedtuple` has
> several properties that make them exclusive:
>
> 1. Fast creation;
> 2. Minimal memory capacity;
> 3. Fast sequence interface;
> 4. Attribute access to elements via properties.
>
> Different namedtuple alternatives has different sets of properties that
> make them more ore less suitable depending on use cases.
>
> So if someone search alternative of collections.namedtuple that support
> assignment too then it could be constructed on top of array (actually
> "tuple" but with assignment support, but python seems have not such array
> type).
>
>
> Again, do you not know about list?
>
>
> ---
> *Zaur Shibzukhov*
>
>
> 2015-03-19 11:37 GMT+03:00 Andrew Barnert <abarnert at yahoo.com>:
>
>> On Mar 19, 2015, at 12:04 AM, Zaur Shibzukhov <szport at gmail.com> wrote:
>>
>>
>>
>> вторник, 17 марта 2015 г., 20:21:01 UTC+3 пользователь Eric V. Smith
>> написал:
>>>
>>> On 03/17/2015 12:52 PM, Luciano Ramalho wrote:
>>> > Sometimes we need a simple class to hold some mutable attributes,
>>> > provide a nice repr, support == for testing, and support iterable
>>> > unpacking, so you can write:
>>> >
>>> >>>> p = Point(3, 4)
>>> >>>> x, y = p
>>> >
>>> > That's very much like the classes built by namedtuple, but mutable.
>>>
>>> https://pypi.python.org/pypi/namedlist
>>>
>>> It also adds default values to the generated constructor, which may or
>>> may not be desirable. But if used exactly like collections.namedtuple,
>>> it ignores the default values.
>>>
>>> Eric.
>>>
>>> Since named tuple is considered as an object that is a tuple with
>> attribute access.
>> The mutable alternative could be considered as an array with attribute
>> access.
>> Array in this context is tuple-like object that support assign operation.
>> Since python have not such object there are different approaches
>> tomutable  named tuple alternatives.
>>
>>
>> Python definitely does have such an object: list. A list is effectively
>> the same as a tuple but mutable; it's the paradigm MutableSequence while
>> tuple is the paradigm Sequence. Under the covers they have very similar
>> headers that both use the same storage (a C array of pointers to Python
>> objects, in CPython), and C API functions like PySequence_Fast_GET_ITEM
>> don't distinguish between the two.
>>
>> However, list is resizable, and presumably a "namedlist" would not be.
>> That makes things more complicated for both the interface (there's no is-a
>> relationship; a type without append is not a list--and, worse, a type that
>> has __setitem__ but can't handle slice replacement is not a list but that's
>> very hard to detect...) and the implementation (e.g., a list reserves extra
>> space at the end to avoid having to reallocate on every append).
>>
>> (Python _also_ has an array type, which is for homogenous simple types
>> (like 32-bit int) which can store the values directly, as opposed to tuple
>> and list, which store (pointers to) heterogenous normal Python objects.)
>>
>> One should note that particular property of named tuple is memory saving.
>> So one can expect similar property of mutable named tuple too.
>>
>>
>> If you don't need to access the items by index for whatever reason, you
>> don't need a namedtuple, and using one as a misguided misoptimization is a
>> bad idea.
>>
>> Besides the fact that a normal class with __slots__ is also small, and
>> even a normal class with a dict (in newer CPython versions and PyPy) not
>> that much bigger, besides the fact that you can eliminate the row overhead
>> rather than just slightly reducing it by using, e.g., a 2D array, you're
>> optimizing the wrong thing in the first place--if your rows have 9
>> elements, reducing the row overhead is focusing on fixing 10% of your
>> overhead, while reducing or eliminating the element overhead by using,
>> e.g., a 2D numpy array of low-level values fixes the 90% (along with the
>> 10%).
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150319/f540a38b/attachment-0001.html>


More information about the Python-ideas mailing list