weirdness with list()

Peter Otten __peter__ at web.de
Sun Feb 28 18:28:14 EST 2021


On 28/02/2021 23:33, Marco Sulla wrote:
> On Sun, 28 Feb 2021 at 01:19, Cameron Simpson <cs at cskk.id.au> wrote:
>> My object represents an MDAT box in an MP4 file: it is the ludicrously
>> large data box containing the raw audiovideo data; for a TV episode it
>> is often about 2GB and a movie is often 4GB to 6GB.
>> [...]
>> That length is presented via the object's __len__ method
>> [...]
>>
>> I noticed that it was stalling, and investigation revealed it was
>> stalling at this line:
>>
>>      subboxes = list(self)
>>
>> when doing the MDAT box. That box (a) has no subboxes at all and (b) has
>> a very large __len__ value.
>>
>> BUT... It also has a __iter__ value, which like any Box iterates over
>> the subboxes. For MDAT that is implemented like this:
>>
>>      def __iter__(self):
>>          yield from ()
>>
>> What I was expecting was pretty much instant construction of an empty
>> list. What I was getting was a very time consuming (10 seconds or more)
>> construction of an empty list.
> 
> I can't reproduce, Am I missing something?
> 
> marco at buzz:~$ python3
> Python 3.6.9 (default, Jan 26 2021, 15:33:00)
> [GCC 8.4.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> class A:
> ...     def __len__(self):
> ...             return 1024**3
> ...     def __iter__(self):
> ...             yield from ()
> ...
>>>> a = A()
>>>> len(a)
> 1073741824
>>>> list(a)
> []
>>>>
> 
> It takes milliseconds to run list(a)

Looks like you need at least Python 3.8 to see this. Quoting
https://docs.python.org/3/whatsnew/3.8.html:

"""
The list constructor does not overallocate the internal item buffer if 
the input iterable has a known length (the input implements __len__). 
This makes the created list 12% smaller on average. (Contributed by 
Raymond Hettinger and Pablo Galindo in bpo-33234.)
"""





More information about the Python-list mailing list