[issue13703] Hash collision security issue

Dave Malcolm report at bugs.python.org
Sat Jan 28 06:13:40 CET 2012


Dave Malcolm <dmalcolm at redhat.com> added the comment:

On Sat, 2012-01-28 at 03:03 +0000, Benjamin Peterson wrote:
> Benjamin Peterson <benjamin at python.org> added the comment:
> 
> For the record, Barry and I agreed on what we'll be doing for stable releases [1]. David says he should have a patch soon.
> 
> [1] http://mail.python.org/pipermail/python-dev/2012-January/115892.html

I'm attaching what I've got so far (need sleep).

Attached patch is for 3.1 and adds opt-in hash randomization.

It's based on haypo's work: random-8.patch (thanks haypo!), with
additional changes as seen in my backport of that to 2.7:
http://bugs.python.org/issue13703#msg151847

* The randomization is off by default, and must be enabled by setting
a new environment variable PYTHONHASHRANDOMIZATION to a non-empty
string. (if so then, PYTHONHASHSEED also still works, if provided, in
the same way as in haypo's patch)

* All of the various "Py_hash_t" become "long" again (Py_hash_t was
added in 3.2: issue9778)

* I expanded the randomization from just PyUnicodeObject to also cover
PyBytesObject, and the types within datetime.

* It doesn't cover numeric types; see my explanation in msg151847; also
see http://bugs.python.org/issue13703#msg151870

* It doesn't yet cover the embedded copy of expat.

* I moved the hash tests from test_unicode.py to test_hash.py

* I tweaked the wording of the descriptions of the envvars in
cmdline.rst and the manpage

* I've tested it on a 32-bit box, and it successfully protects against
one set of test data (four cases: assembling then reading back items by
key for a dict vs set, bytes vs str, with 200000 distinct items of data
which all have hash() == 0 in unmodified build; each takes about 1.5
seconds on this --with-pydebug build, vs of the order of hours).

* I haven't yet benchmarked it

* Only tested on Linux (Fedora x86_64 and i686).  I don't know the
impact on windows (e.g. startup time without the envvar vs with the env
vars).

I'm seeing one failing test:
======================================================================
FAIL: test_clear_dict_in_ref_cycle (__main__.ModuleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/home/david/coding/python-hg/cpython-3.1-hash-randomization/Lib/test/test_module.py", line 79, in test_clear_dict_in_ref_cycle
    self.assertEqual(destroyed, [1])
AssertionError: Lists differ: [] != [1]

----------
Added file: http://bugs.python.org/file24343/optin-hash-randomization-for-3.1-dmalcolm-2012-01-27-001.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13703>
_______________________________________
-------------- next part --------------
diff -r 73dad4940b88 Doc/using/cmdline.rst
--- a/Doc/using/cmdline.rst	Fri Jan 20 11:23:02 2012 +0000
+++ b/Doc/using/cmdline.rst	Sat Jan 28 00:03:46 2012 -0500
@@ -435,6 +435,34 @@
    import of source modules.
 
 
+.. envvar:: PYTHONHASHRANDOMIZATION
+
+   If this is set to a non-empty string, the hash() values of str, bytes
+   and datetime objects are "salted" with an unpredictable random value.
+   Although they remain constant within an individual Python process, they
+   are not predictable between repeated invocations of Python.
+
+   This is intended to provide protection against a denial-of-service
+   caused by carefully-chosen inputs that exploit the worst case performance
+   of a dict lookup, O(n^2) complexity.  See:
+   
+       http://www.ocert.org/advisories/ocert-2011-003.html
+
+   for details.
+
+   Changing hash values affects the order in which keys are retrieved from
+   a dict.  Although Python has never made guarantees about this ordering
+   (and it typically varies between 32-bit and 64-bit builds), enough
+   real-world code implicitly relies on this non-guaranteed behavior that
+   the randomization is disabled by default.
+
+.. envvar:: PYTHONHASHSEED
+
+   If this is set, it is used as a fixed seed for generating the hash() of
+   the types covererd by PYTHONHASHRANDOMIZATION.  It should be a number in
+   the range [0; 4294967295].  The value 0 overrides the other variable and
+   disables the hash random salt.
+
 .. envvar:: PYTHONIOENCODING
 
    Overrides the encoding used for stdin/stdout/stderr, in the syntax
diff -r 73dad4940b88 Include/object.h
--- a/Include/object.h	Fri Jan 20 11:23:02 2012 +0000
+++ b/Include/object.h	Sat Jan 28 00:03:46 2012 -0500
@@ -473,6 +473,12 @@
 PyAPI_FUNC(long) _Py_HashDouble(double);
 PyAPI_FUNC(long) _Py_HashPointer(void*);
 
+typedef struct {
+    long prefix;
+    long suffix;
+} _Py_HashSecret_t;
+PyAPI_DATA(_Py_HashSecret_t) _Py_HashSecret;
+
 /* Helper for passing objects to printf and the like */
 #define PyObject_REPR(obj) _PyUnicode_AsString(PyObject_Repr(obj))
 
diff -r 73dad4940b88 Include/pythonrun.h
--- a/Include/pythonrun.h	Fri Jan 20 11:23:02 2012 +0000
+++ b/Include/pythonrun.h	Sat Jan 28 00:03:46 2012 -0500
@@ -174,6 +174,8 @@
 PyAPI_FUNC(PyOS_sighandler_t) PyOS_getsig(int);
 PyAPI_FUNC(PyOS_sighandler_t) PyOS_setsig(int, PyOS_sighandler_t);
 
+/* Random */
+PyAPI_FUNC(int) _PyOS_URandom (void *buffer, Py_ssize_t size);
 
 #ifdef __cplusplus
 }
diff -r 73dad4940b88 Lib/os.py
--- a/Lib/os.py	Fri Jan 20 11:23:02 2012 +0000
+++ b/Lib/os.py	Sat Jan 28 00:03:46 2012 -0500
@@ -611,23 +611,6 @@
 except NameError: # statvfs_result may not exist
     pass
 
-if not _exists("urandom"):
-    def urandom(n):
-        """urandom(n) -> str
-
-        Return a string of n random bytes suitable for cryptographic use.
-
-        """
-        try:
-            _urandomfd = open("/dev/urandom", O_RDONLY)
-        except (OSError, IOError):
-            raise NotImplementedError("/dev/urandom (or equivalent) not found")
-        bs = b""
-        while len(bs) < n:
-            bs += read(_urandomfd, n - len(bs))
-        close(_urandomfd)
-        return bs
-
 # Supply os.popen()
 def popen(cmd, mode="r", buffering=-1):
     if not isinstance(cmd, str):
diff -r 73dad4940b88 Lib/test/regrtest.py
--- a/Lib/test/regrtest.py	Fri Jan 20 11:23:02 2012 +0000
+++ b/Lib/test/regrtest.py	Sat Jan 28 00:03:46 2012 -0500
@@ -428,6 +428,11 @@
         except ValueError:
             print("Couldn't find starting test (%s), using all tests" % start)
     if randomize:
+        hashseed = os.getenv('PYTHONHASHSEED')
+        if not hashseed:
+            os.environ['PYTHONHASHSEED'] = str(random_seed)
+            os.execv(sys.executable, [sys.executable] + sys.argv)
+            return
         random.seed(random_seed)
         print("Using random seed", random_seed)
         random.shuffle(tests)
diff -r 73dad4940b88 Lib/test/test_hash.py
--- a/Lib/test/test_hash.py	Fri Jan 20 11:23:02 2012 +0000
+++ b/Lib/test/test_hash.py	Sat Jan 28 00:03:46 2012 -0500
@@ -3,10 +3,14 @@
 #
 # Also test that hash implementations are inherited as expected
 
+import struct
 import unittest
 from test import support
+from test.script_helper import assert_python_ok
 from collections import Hashable
 
+IS_64BIT = (struct.calcsize('l') == 8)
+
 
 class HashEqualityTestCase(unittest.TestCase):
 
@@ -118,10 +122,65 @@
         for obj in self.hashes_to_check:
             self.assertEqual(hash(obj), _default_hash(obj))
 
+class RandomizationTestCase(unittest.TestCase):
+
+    # Examples of the various types having randomized hash:
+    test_reprs = [repr('abc'), repr(b'abc')]
+
+    def get_hash(self, repr_, randomization=None, seed=None):
+        env = {}
+        if randomization is not None:
+            env['PYTHONHASHRANDOMIZATION'] = str(randomization)
+        if seed is not None:
+            env['PYTHONHASHSEED'] = str(seed)
+        out = assert_python_ok(
+            '-c', 'print(hash(%s))' % repr_,
+            **env)
+        stdout = out[1].strip()
+        return int(stdout)
+
+    def test_empty_string(self):
+        self.assertEqual(hash(""), 0)
+        self.assertEqual(hash(b""), 0)
+
+    def test_null_hash(self):
+        # PYTHONHASHSEED=0 disables the randomized hash
+        if IS_64BIT:
+            known_hash_of_obj = 1453079729188098211
+        else:
+            known_hash_of_obj = -1600925533
+        for repr_ in self.test_reprs:
+            # Randomization is disabled by default:
+            self.assertEqual(self.get_hash(repr_), known_hash_of_obj)
+            
+            # If enabled, it can still be disabled by setting the seed to 0:
+            self.assertEqual(self.get_hash(repr_, randomization=1, seed=0),
+                             known_hash_of_obj)
+
+    def test_fixed_hash(self):
+        # test a fixed seed for the randomized hash
+        # Note that all types share the same values:
+        if IS_64BIT:
+            h = -4410911502303878509
+        else:
+            h = -206076799
+        for repr_ in self.test_reprs:
+            self.assertEqual(self.get_hash(repr_, randomization=1, seed=42),
+                             h)
+
+    def test_randomized_hash(self):
+        # two runs should return different hashes
+        for repr_ in self.test_reprs:
+            run1 = self.get_hash(repr_, randomization=1)
+            run2 = self.get_hash(repr_, randomization=1)
+            self.assertNotEqual(run1, run2)
+
+
 def test_main():
     support.run_unittest(HashEqualityTestCase,
                               HashInheritanceTestCase,
-                              HashBuiltinsTestCase)
+                              HashBuiltinsTestCase,
+                         RandomizationTestCase)
 
 
 if __name__ == "__main__":
diff -r 73dad4940b88 Lib/test/test_os.py
--- a/Lib/test/test_os.py	Fri Jan 20 11:23:02 2012 +0000
+++ b/Lib/test/test_os.py	Sat Jan 28 00:03:46 2012 -0500
@@ -9,6 +9,7 @@
 import sys
 import shutil
 from test import support
+from test.script_helper import assert_python_ok
 
 # Detect whether we're on a Linux system that uses the (now outdated
 # and unmaintained) linuxthreads threading library.  There's an issue
@@ -574,14 +575,33 @@
         f.close()
 
 class URandomTests(unittest.TestCase):
-    def test_urandom(self):
-        try:
-            self.assertEqual(len(os.urandom(1)), 1)
-            self.assertEqual(len(os.urandom(10)), 10)
-            self.assertEqual(len(os.urandom(100)), 100)
-            self.assertEqual(len(os.urandom(1000)), 1000)
-        except NotImplementedError:
-            pass
+    def test_urandom_length(self):
+        self.assertEqual(len(os.urandom(0)), 0)
+        self.assertEqual(len(os.urandom(1)), 1)
+        self.assertEqual(len(os.urandom(10)), 10)
+        self.assertEqual(len(os.urandom(100)), 100)
+        self.assertEqual(len(os.urandom(1000)), 1000)
+
+    def test_urandom_value(self):
+        data1 = os.urandom(16)
+        data2 = os.urandom(16)
+        self.assertNotEqual(data1, data2)
+
+    def get_urandom_subprocess(self, count):
+        code = '\n'.join((
+            'import os, sys',
+            'data = os.urandom(%s)' % count,
+            'sys.stdout.buffer.write(data)',
+            'sys.stdout.buffer.flush()'))
+        out = assert_python_ok('-c', code)
+        stdout = out[1]
+        self.assertEqual(len(stdout), 16)
+        return stdout
+
+    def test_urandom_subprocess(self):
+        data1 = self.get_urandom_subprocess(16)
+        data2 = self.get_urandom_subprocess(16)
+        self.assertNotEqual(data1, data2)
 
 class ExecTests(unittest.TestCase):
     @unittest.skipIf(USING_LINUXTHREADS,
diff -r 73dad4940b88 Lib/test/test_set.py
--- a/Lib/test/test_set.py	Fri Jan 20 11:23:02 2012 +0000
+++ b/Lib/test/test_set.py	Sat Jan 28 00:03:46 2012 -0500
@@ -734,6 +734,17 @@
         if self.repr is not None:
             self.assertEqual(repr(self.set), self.repr)
 
+    def check_repr_against_values(self):
+        text = repr(self.set)
+        self.assertTrue(text.startswith('{'))
+        self.assertTrue(text.endswith('}'))
+
+        result = text[1:-1].split(', ')
+        result.sort()
+        sorted_repr_values = [repr(value) for value in self.values]
+        sorted_repr_values.sort()
+        self.assertEqual(result, sorted_repr_values)
+
     def test_print(self):
         try:
             fo = open(support.TESTFN, "w")
@@ -892,7 +903,9 @@
         self.set    = set(self.values)
         self.dup    = set(self.values)
         self.length = 3
-        self.repr   = "{'a', 'c', 'b'}"
+
+    def test_repr(self):
+        self.check_repr_against_values()
 
 #------------------------------------------------------------------------------
 
@@ -903,7 +916,9 @@
         self.set    = set(self.values)
         self.dup    = set(self.values)
         self.length = 3
-        self.repr   = "{b'a', b'c', b'b'}"
+
+    def test_repr(self):
+        self.check_repr_against_values()
 
 #------------------------------------------------------------------------------
 
@@ -916,11 +931,13 @@
         self.set    = set(self.values)
         self.dup    = set(self.values)
         self.length = 4
-        self.repr   = "{'a', b'a', 'b', b'b'}"
-
+ 
     def tearDown(self):
         warnings.filters = self.warning_filters
 
+    def test_repr(self):
+        self.check_repr_against_values()
+
 #==============================================================================
 
 def baditer():
diff -r 73dad4940b88 Makefile.pre.in
--- a/Makefile.pre.in	Fri Jan 20 11:23:02 2012 +0000
+++ b/Makefile.pre.in	Sat Jan 28 00:03:46 2012 -0500
@@ -305,6 +305,7 @@
 		Python/pymath.o \
 		Python/pystate.o \
 		Python/pythonrun.o \
+		Python/random.o \
 		Python/structmember.o \
 		Python/symtable.o \
 		Python/sysmodule.o \
diff -r 73dad4940b88 Misc/python.man
--- a/Misc/python.man	Fri Jan 20 11:23:02 2012 +0000
+++ b/Misc/python.man	Sat Jan 28 00:03:46 2012 -0500
@@ -403,6 +403,28 @@
 If this is set to a non-empty string it is equivalent to specifying
 the \fB\-v\fP option. If set to an integer, it is equivalent to
 specifying \fB\-v\fP multiple times. 
+.IP PYTHONHASHRANDOMIZATION
+If this is set to a non-empty string, the hash() values of str, bytes and
+datetime objects are "salted" with an unpredictable random value.  Although they
+remain constant within an individual Python process, they are not predictable
+between repeated invocations of Python.
+.IP
+This is intended to provide protection against a denial of service
+caused by carefully-chosen inputs that exploit the worst case performance
+of a dict lookup, O(n^2) complexity.  See
+http://www.ocert.org/advisories/ocert-2011-003.html
+for details.
+.IP
+Changing hash values affects the order in which keys are retrieved from
+a dict.  Although Python has never made guarantees about this ordering
+(and it typically varies between 32-bit and 64-bit builds), enough
+real-world code implicitly relies on this non-guaranteed behavior that
+the randomization is disabled by default.
+.IP PYTHONHASHSEED
+If this is set, it is used as a fixed seed for generating the hash() of
+the types covererd by PYTHONHASHRANDOMIZATION.  It should be a number in
+the range [0; 4294967295].  The value 0 overrides the other variable and
+disables the hash random salt.
 .SH AUTHOR
 The Python Software Foundation: http://www.python.org/psf
 .SH INTERNET RESOURCES
diff -r 73dad4940b88 Modules/datetimemodule.c
--- a/Modules/datetimemodule.c	Fri Jan 20 11:23:02 2012 +0000
+++ b/Modules/datetimemodule.c	Sat Jan 28 00:03:46 2012 -0500
@@ -2566,10 +2566,12 @@
     register long x;
 
     p = (unsigned char *) data;
-    x = *p << 7;
+    x = _Py_HashSecret.prefix;
+    x ^= *p << 7;
     while (--len >= 0)
         x = (1000003*x) ^ *p++;
     x ^= len;
+    x ^= _Py_HashSecret.suffix;
     if (x == -1)
         x = -2;
 
diff -r 73dad4940b88 Modules/posixmodule.c
--- a/Modules/posixmodule.c	Fri Jan 20 11:23:02 2012 +0000
+++ b/Modules/posixmodule.c	Sat Jan 28 00:03:46 2012 -0500
@@ -6942,82 +6942,6 @@
 }
 #endif
 
-#ifdef MS_WINDOWS
-
-PyDoc_STRVAR(win32_urandom__doc__,
-"urandom(n) -> str\n\n\
-Return n random bytes suitable for cryptographic use.");
-
-typedef BOOL (WINAPI *CRYPTACQUIRECONTEXTA)(HCRYPTPROV *phProv,\
-              LPCSTR pszContainer, LPCSTR pszProvider, DWORD dwProvType,\
-              DWORD dwFlags );
-typedef BOOL (WINAPI *CRYPTGENRANDOM)(HCRYPTPROV hProv, DWORD dwLen,\
-              BYTE *pbBuffer );
-
-static CRYPTGENRANDOM pCryptGenRandom = NULL;
-/* This handle is never explicitly released. Instead, the operating
-   system will release it when the process terminates. */
-static HCRYPTPROV hCryptProv = 0;
-
-static PyObject*
-win32_urandom(PyObject *self, PyObject *args)
-{
-    int howMany;
-    PyObject* result;
-
-    /* Read arguments */
-    if (! PyArg_ParseTuple(args, "i:urandom", &howMany))
-        return NULL;
-    if (howMany < 0)
-        return PyErr_Format(PyExc_ValueError,
-                            "negative argument not allowed");
-
-    if (hCryptProv == 0) {
-        HINSTANCE hAdvAPI32 = NULL;
-        CRYPTACQUIRECONTEXTA pCryptAcquireContext = NULL;
-
-        /* Obtain handle to the DLL containing CryptoAPI
-           This should not fail         */
-        hAdvAPI32 = GetModuleHandle("advapi32.dll");
-        if(hAdvAPI32 == NULL)
-            return win32_error("GetModuleHandle", NULL);
-
-        /* Obtain pointers to the CryptoAPI functions
-           This will fail on some early versions of Win95 */
-        pCryptAcquireContext = (CRYPTACQUIRECONTEXTA)GetProcAddress(
-                                        hAdvAPI32,
-                                        "CryptAcquireContextA");
-        if (pCryptAcquireContext == NULL)
-            return PyErr_Format(PyExc_NotImplementedError,
-                                "CryptAcquireContextA not found");
-
-        pCryptGenRandom = (CRYPTGENRANDOM)GetProcAddress(
-                                        hAdvAPI32, "CryptGenRandom");
-        if (pCryptGenRandom == NULL)
-            return PyErr_Format(PyExc_NotImplementedError,
-                                "CryptGenRandom not found");
-
-        /* Acquire context */
-        if (! pCryptAcquireContext(&hCryptProv, NULL, NULL,
-                                   PROV_RSA_FULL, CRYPT_VERIFYCONTEXT))
-            return win32_error("CryptAcquireContext", NULL);
-    }
-
-    /* Allocate bytes */
-    result = PyBytes_FromStringAndSize(NULL, howMany);
-    if (result != NULL) {
-        /* Get random data */
-        memset(PyBytes_AS_STRING(result), 0, howMany); /* zero seed */
-        if (! pCryptGenRandom(hCryptProv, howMany, (unsigned char*)
-                              PyBytes_AS_STRING(result))) {
-            Py_DECREF(result);
-            return win32_error("CryptGenRandom", NULL);
-        }
-    }
-    return result;
-}
-#endif
-
 PyDoc_STRVAR(device_encoding__doc__,
 "device_encoding(fd) -> str\n\n\
 Return a string describing the encoding of the device\n\
@@ -7055,41 +6979,35 @@
     return Py_None;
 }
 
-#ifdef __VMS
-/* Use openssl random routine */
-#include <openssl/rand.h>
-PyDoc_STRVAR(vms_urandom__doc__,
+PyDoc_STRVAR(posix_urandom__doc__,
 "urandom(n) -> str\n\n\
-Return n random bytes suitable for cryptographic use.");
-
-static PyObject*
-vms_urandom(PyObject *self, PyObject *args)
-{
-    int howMany;
-    PyObject* result;
-
-    /* Read arguments */
-    if (! PyArg_ParseTuple(args, "i:urandom", &howMany))
-        return NULL;
-    if (howMany < 0)
+Return n pseudo-random bytes.");
+
+static PyObject *
+posix_urandom(PyObject *self, PyObject *args)
+{
+    Py_ssize_t size;
+    PyObject *result;
+    int ret;
+ 
+     /* Read arguments */
+    if (!PyArg_ParseTuple(args, "n:urandom", &size))
+        return NULL;
+    if (size < 0)
         return PyErr_Format(PyExc_ValueError,
                             "negative argument not allowed");
-
-    /* Allocate bytes */
-    result = PyBytes_FromStringAndSize(NULL, howMany);
-    if (result != NULL) {
-        /* Get random data */
-        if (RAND_pseudo_bytes((unsigned char*)
-                              PyBytes_AS_STRING(result),
-                              howMany) < 0) {
-            Py_DECREF(result);
-            return PyErr_Format(PyExc_ValueError,
-                                "RAND_pseudo_bytes");
-        }
+    result = PyBytes_FromStringAndSize(NULL, size);
+    if (result == NULL)
+        return NULL;
+
+    ret = _PyOS_URandom(PyBytes_AS_STRING(result),
+                        PyBytes_GET_SIZE(result));
+    if (ret == -1) {
+        Py_DECREF(result);
+        return NULL;
     }
     return result;
 }
-#endif
 
 static PyMethodDef posix_methods[] = {
     {"access",          posix_access, METH_VARARGS, posix_access__doc__},
@@ -7374,12 +7292,7 @@
 #ifdef HAVE_GETLOADAVG
     {"getloadavg",      posix_getloadavg, METH_NOARGS, posix_getloadavg__doc__},
 #endif
- #ifdef MS_WINDOWS
-    {"urandom", win32_urandom, METH_VARARGS, win32_urandom__doc__},
- #endif
- #ifdef __VMS
-    {"urandom", vms_urandom, METH_VARARGS, vms_urandom__doc__},
- #endif
+    {"urandom",         posix_urandom,   METH_VARARGS, posix_urandom__doc__},
     {NULL,              NULL}            /* Sentinel */
 };
 
diff -r 73dad4940b88 Objects/bytesobject.c
--- a/Objects/bytesobject.c	Fri Jan 20 11:23:02 2012 +0000
+++ b/Objects/bytesobject.c	Sat Jan 28 00:03:46 2012 -0500
@@ -899,11 +899,17 @@
     if (a->ob_shash != -1)
         return a->ob_shash;
     len = Py_SIZE(a);
+    if (len == 0) {
+        a->ob_shash = 0;
+        return 0;
+    }
     p = (unsigned char *) a->ob_sval;
-    x = *p << 7;
+    x = _Py_HashSecret.prefix;
+    x ^= *p << 7;
     while (--len >= 0)
         x = (1000003*x) ^ *p++;
     x ^= Py_SIZE(a);
+    x ^= _Py_HashSecret.suffix;
     if (x == -1)
         x = -2;
     a->ob_shash = x;
diff -r 73dad4940b88 Objects/object.c
--- a/Objects/object.c	Fri Jan 20 11:23:02 2012 +0000
+++ b/Objects/object.c	Sat Jan 28 00:03:46 2012 -0500
@@ -712,6 +712,8 @@
     return -1;
 }
 
+_Py_HashSecret_t _Py_HashSecret;
+
 long
 PyObject_Hash(PyObject *v)
 {
diff -r 73dad4940b88 Objects/unicodeobject.c
--- a/Objects/unicodeobject.c	Fri Jan 20 11:23:02 2012 +0000
+++ b/Objects/unicodeobject.c	Sat Jan 28 00:03:46 2012 -0500
@@ -7344,11 +7344,17 @@
     if (self->hash != -1)
         return self->hash;
     len = Py_SIZE(self);
+    if (len == 0) {
+        self->hash = 0;
+        return 0;
+    }
     p = self->str;
-    x = *p << 7;
+    x = _Py_HashSecret.prefix;
+    x ^= *p << 7;
     while (--len >= 0)
         x = (1000003*x) ^ *p++;
     x ^= Py_SIZE(self);
+    x ^= _Py_HashSecret.suffix;
     if (x == -1)
         x = -2;
     self->hash = x;
diff -r 73dad4940b88 PCbuild/pythoncore.vcproj
--- a/PCbuild/pythoncore.vcproj	Fri Jan 20 11:23:02 2012 +0000
+++ b/PCbuild/pythoncore.vcproj	Sat Jan 28 00:03:46 2012 -0500
@@ -1778,6 +1778,10 @@
 				RelativePath="..\Python\pythonrun.c"
 				>
 			</File>
+ 			<File
+				RelativePath="..\Python\random.c"
+				>
+			</File>
 			<File
 				RelativePath="..\Python\structmember.c"
 				>
diff -r 73dad4940b88 Python/pythonrun.c
--- a/Python/pythonrun.c	Fri Jan 20 11:23:02 2012 +0000
+++ b/Python/pythonrun.c	Sat Jan 28 00:03:46 2012 -0500
@@ -73,6 +73,7 @@
 extern void _PyUnicode_Fini(void);
 extern int _PyLong_Init(void);
 extern void PyLong_Fini(void);
+extern void _PyRandom_Init(void);
 
 #ifdef WITH_THREAD
 extern void _PyGILState_Init(PyInterpreterState *, PyThreadState *);
@@ -196,6 +197,8 @@
     if ((p = Py_GETENV("PYTHONDONTWRITEBYTECODE")) && *p != '\0')
         Py_DontWriteBytecodeFlag = add_flag(Py_DontWriteBytecodeFlag, p);
 
+    _PyRandom_Init();
+
     interp = PyInterpreterState_New();
     if (interp == NULL)
         Py_FatalError("Py_Initialize: can't make first interpreter");
diff -r 73dad4940b88 Python/random.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/Python/random.c	Sat Jan 28 00:03:46 2012 -0500
@@ -0,0 +1,284 @@
+#include "Python.h"
+#ifdef MS_WINDOWS
+#include <windows.h>
+#else
+#include <fcntl.h>
+#endif
+
+static int random_initialized = 0;
+
+#ifdef MS_WINDOWS
+typedef BOOL (WINAPI *CRYPTACQUIRECONTEXTA)(HCRYPTPROV *phProv,\
+              LPCSTR pszContainer, LPCSTR pszProvider, DWORD dwProvType,\
+              DWORD dwFlags );
+typedef BOOL (WINAPI *CRYPTGENRANDOM)(HCRYPTPROV hProv, DWORD dwLen,\
+              BYTE *pbBuffer );
+
+static CRYPTGENRANDOM pCryptGenRandom = NULL;
+/* This handle is never explicitly released. Instead, the operating
+   system will release it when the process terminates. */
+static HCRYPTPROV hCryptProv = 0;
+
+static int
+win32_urandom_init(int raise)
+{
+    HINSTANCE hAdvAPI32 = NULL;
+    CRYPTACQUIRECONTEXTA pCryptAcquireContext = NULL;
+
+    /* Obtain handle to the DLL containing CryptoAPI. This should not fail. */
+    hAdvAPI32 = GetModuleHandle("advapi32.dll");
+    if(hAdvAPI32 == NULL)
+        goto error;
+
+    /* Obtain pointers to the CryptoAPI functions. This will fail on some early
+       versions of Win95. */
+    pCryptAcquireContext = (CRYPTACQUIRECONTEXTA)GetProcAddress(
+                               hAdvAPI32, "CryptAcquireContextA");
+    if (pCryptAcquireContext == NULL)
+        goto error;
+
+    pCryptGenRandom = (CRYPTGENRANDOM)GetProcAddress(hAdvAPI32,
+                                                     "CryptGenRandom");
+    if (pCryptGenRandom == NULL)
+        goto error;
+
+    /* Acquire context */
+    if (! pCryptAcquireContext(&hCryptProv, NULL, NULL,
+                               PROV_RSA_FULL, CRYPT_VERIFYCONTEXT))
+        goto error;
+
+    return 0;
+
+error:
+    if (raise)
+        PyErr_SetFromWindowsErr(0);
+    else
+        Py_FatalError("Failed to initialize Windows random API (CryptoGen)");
+    return -1;
+}
+
+/* Fill buffer with size pseudo-random bytes generated by the Windows CryptoGen
+   API. Return 0 on success, or -1 on error. */
+static int
+win32_urandom(unsigned char *buffer, Py_ssize_t size, int raise)
+{
+    Py_ssize_t orig_size = size;
+    Py_ssize_t chunk;
+
+    if (hCryptProv == 0)
+    {
+        if (win32_urandom_init(raise) == -1)
+            return -1;
+    }
+
+    while (size > 0)
+    {
+        chunk = Py_MIN(size, INT_MAX);
+        if (!pCryptGenRandom(hCryptProv, chunk, buffer))
+        {
+            /* CryptGenRandom() failed */
+            if (raise)
+                PyErr_SetFromWindowsErr(0);
+            else
+                Py_FatalError("Failed to initialized the randomized hash "
+                        "secret using CryptoGen)");
+            return -1;
+        }
+        buffer += chunk;
+        size -= chunk;
+    }
+    return 0;
+}
+
+#else
+
+/* Read size bytes from /dev/urandom into buffer.
+   Call Py_FatalError() on error. */
+static void
+dev_urandom_noraise(char *buffer, Py_ssize_t size)
+{
+    int fd;
+    Py_ssize_t n;
+
+    assert (0 < size);
+
+    fd = open("/dev/urandom", O_RDONLY);
+    if (fd < 0)
+        Py_FatalError("Failed to open /dev/urandom");
+
+    while (0 < size)
+    {
+        do {
+            n = read(fd, buffer, (size_t)size);
+        } while (n < 0 && errno == EINTR);
+        if (n <= 0)
+        {
+            /* stop on error or if read(size) returned 0 */
+            Py_FatalError("Failed to read bytes from /dev/urandom");
+            break;
+        }
+        buffer += n;
+        size -= (Py_ssize_t)n;
+    }
+    close(fd);
+}
+
+/* Read size bytes from /dev/urandom into buffer.
+   Return 0 on success, raise an exception and return -1 on error. */
+static int
+dev_urandom_python(char *buffer, Py_ssize_t size)
+{
+    int fd;
+    Py_ssize_t n;
+
+    if (size <= 0)
+        return 0;
+
+    Py_BEGIN_ALLOW_THREADS
+    fd = open("/dev/urandom", O_RDONLY);
+    Py_END_ALLOW_THREADS
+    if (fd < 0)
+    {
+        PyErr_SetFromErrnoWithFilename(PyExc_OSError, "/dev/urandom");
+        return -1;
+    }
+
+    Py_BEGIN_ALLOW_THREADS
+    do {
+        do {
+            n = read(fd, buffer, (size_t)size);
+        } while (n < 0 && errno == EINTR);
+        if (n <= 0)
+            break;
+        buffer += n;
+        size -= (Py_ssize_t)n;
+    } while (0 < size);
+    Py_END_ALLOW_THREADS
+
+    if (n <= 0)
+    {
+        /* stop on error or if read(size) returned 0 */
+        if (n < 0)
+            PyErr_SetFromErrno(PyExc_OSError);
+        else
+            PyErr_Format(PyExc_RuntimeError,
+                         "Failed to read %zi bytes from /dev/urandom",
+                         size);
+        close(fd);
+        return -1;
+    }
+    close(fd);
+    return 0;
+}
+#endif
+
+/* Fill buffer with pseudo-random bytes generated by a linear congruent
+   generator (LCG):
+
+       x(n+1) = (x(n) * 214013 + 2531011) % 2^32
+
+   Use bits 23..16 of x(n) to generate a byte. */
+static void
+lcg_urandom(unsigned int x0, unsigned char *buffer, size_t size)
+{
+    size_t index;
+    unsigned int x;
+
+    x = x0;
+    for (index=0; index < size; index++) {
+        x *= 214013;
+        x += 2531011;
+        /* modulo 2 ^ (8 * sizeof(int)) */
+        buffer[index] = (x >> 16) & 0xff;
+    }
+}
+
+/* Fill buffer with size pseudo-random bytes, not suitable for cryptographic
+   use, from the operating random number generator (RNG).
+
+   Return 0 on success, raise an exception and return -1 on error. */
+int
+_PyOS_URandom(void *buffer, Py_ssize_t size)
+{
+    if (size < 0) {
+        PyErr_Format(PyExc_ValueError,
+                     "negative argument not allowed");
+        return -1;
+    }
+    if (size == 0)
+        return 0;
+
+#ifdef MS_WINDOWS
+    return win32_urandom((unsigned char *)buffer, size, 1);
+#else
+    return dev_urandom_python((char*)buffer, size);
+#endif
+}
+
+void
+_PyRandom_Init(void)
+{
+    char *env;
+    void *secret = &_Py_HashSecret;
+    Py_ssize_t secret_size = sizeof(_Py_HashSecret);
+
+    if (random_initialized)
+        return;
+    random_initialized = 1;
+
+    /*
+      By default, hash randomization is disabled, and only
+      enabled if PYTHONHASHRANDOMIZATION is set to non-empty
+    */
+    env = Py_GETENV("PYTHONHASHRANDOMIZATION");
+    if (!env || *env == '\0') {
+        /* Not found: disable the randomized hash: */
+        memset(secret, 0, secret_size);
+        return;
+    }
+
+    /*
+      PYTHONHASHRANDOMIZATION was found; generate a per-process secret,
+      using PYTHONHASHSEED if provided.
+    */
+
+    env = Py_GETENV("PYTHONHASHSEED");
+    if (env && *env != '\0') {
+        char *endptr = env;
+        unsigned long seed;
+        seed = strtoul(env, &endptr, 10);
+        if (*endptr != '\0'
+            || seed > 4294967295UL
+            || (errno == ERANGE && seed == ULONG_MAX))
+        {
+            Py_FatalError("PYTHONHASHSEED must be an integer "
+                          "in range [0; 4294967295]");
+        }
+        if (seed == 0) {
+            /* disable the randomized hash */
+            memset(secret, 0, secret_size);
+        }
+        else {
+            lcg_urandom(seed, (unsigned char*)secret, secret_size);
+        }
+    }
+    else {
+#ifdef MS_WINDOWS
+#if 1
+        (void)win32_urandom((unsigned char *)secret, secret_size, 0);
+#else
+        /* fast but weak RNG (fast initialization, weak seed) */
+        _PyTime_timeval t;
+        unsigned int seed;
+        _PyTime_gettimeofday(&t);
+        seed = (unsigned int)t.tv_sec;
+        seed ^= t.tv_usec;
+        seed ^= getpid();
+        lcg_urandom(seed, (unsigned char*)secret, secret_size);
+#endif
+#else /* #ifdef MS_WINDOWS */
+        dev_urandom_noraise((char*)secret, secret_size);
+#endif
+    }
+}
+


More information about the Python-bugs-list mailing list