[New-bugs-announce] [issue26423] __len__() returns 32 bit int on windows leading to overflows

Dave Hibbitts report at bugs.python.org
Tue Feb 23 18:04:48 EST 2016


New submission from Dave Hibbitts:

__len__() always returns an int which on windows machines is tied to the size of a c long and is always 32 bits even if it's compiled for 64 bit. len() however returns an int for values less than sys.maxint and a long above that. 

Returning an int in __len__() causes it to return negative lengths for objects of size greater than sys.maxint, below you can see a quick test on how to reproduce it.

And here's an explanation from \u\Rhomboid on Reddit of why we believe the issue happens.

"You'll only see that on Windows. The issue is that, confusingly, the range of the Python int type is tied to the range of the C long type. On Windows long is always 32 bits even on x64 systems, whereas on Unix systems it's the native machine word size. You can confirm this by checking sys.maxint, which will be 2**31 - 1 even with a 64 bit interpreter on Windows.

The difference in behavior of foo.__len__ vs len(foo) is that the former goes through an attribute lookup which goes through the slot lookup stuff, finally ending in Python/typeobject.c:wrap_lenfunc(). The error is casting Py_ssize_t to long, which truncates on Windows x64 as Py_ssize_t is a proper signed 64 bit integer. And then it compounds the injury by creating a Python int object with PyInt_FromLong(), so this is hopelessly broken. In the case of len(foo), you end up in Python/bltinmodule.c:builtin_len() which skips all the attribute lookup stuff and uses the object protocol directly, calling PyObject_Size() and creating a Python object of the correct type via PyInt_FromSsize_t() which figures out whether a Python int or long is necessary.

This is definitely a bug that should be reported. In 3.x the int/long distinction is gone and all integers are Python longs, but the bogus cast to a C long still exists in wrap_lenfunc():

    return PyLong_FromLong((long)res);

That means the bug still exists even though the reason for its existence is gone! Oops. That needs to be updated to get rid of the cast and call PyLong_FromSsize_t()."

Python 2.7.8 |Anaconda 2.1.0 (64-bit)| (default, Jul  2 2014, 15:12:11) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> a = 'a'*2500000000
>>> a.__len__()
-1794967296
>>> len(a)
2500000000L
>>> a = [1]*2500000000
>>> len(a)
2500000000L
>>> a.__len__()
-1794967296

----------
components: Windows
messages: 260749
nosy: Dave Hibbitts, georg.brandl, paul.moore, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: __len__() returns 32 bit int on windows leading to overflows
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26423>
_______________________________________


More information about the New-bugs-announce mailing list