[Python-Dev] Weird use of hash() -- will this work?

Tim Peters tim.one@home.com
Thu, 18 Jan 2001 16:53:44 -0500


[Eric S. Raymond, in search of uniqueness]
> ...
> So, how about `time.time()` + hex(hash([]))?
>
> It looks to me like this will remain unique forever, because
> another thread would have to create an object at the same memory
> address during the same millisecond to collide.

I'm afraid it's much more vulnerable than that:  Python's thread granularity
is at the bytecode level, not the statement level.  It's very easy for
thread A and B to see the same `time.time()` value, and after that
arbitrarily long amounts of time may pass before they get around to doing
the hash([]) business.  When hash() completes, the storage for [] is
immediately reclaimed under CPython, and it's again very easy for another
thread to reuse the storage.

I'm attaching an executable test case.  It uses time.clock() because that
has much higher resolution than time.time() on Windows (better than
microsecond), but rounds it back to three decimal places to simulate
millisecond resolution.  The first three runs:

    saw 14600 unique in 30000 total
    saw 14597 unique in 30000 total
    saw 14645 unique in 30000 total

So it sucks bigtime on my box.

Better idea:  borrow the _ThreadSafeCounter class from the tail end of the
current CVS tempfile.py.  The code works whether or not threads are
available.  Then

    `time.time()` + str(_counter.get_next())

is thread-safe.  For that matter, plain old

    str(_counter.get_next())

will always be unique within a single run.  However, in either case you're
still not safe against concurrent *processes* generating the same cookies.

tempfile.py has to worry about that too, of course, so the *best* idea is to
call tempfile.mktemp() and leave it at that.  It wastes some time checking
the filesystem for a file of the same name (which, btw, goes much quicker on
Linux than on Windows).

>From time to time, somebody suggests adding a uuid generator to Python.  Not
a bad idea, but nobody wants to do all the x-platform work.

like-capturing-snowflakes-ly y'rs  - tim

from threading import Thread
import time

N = 1000
NTHREADS = 30

class Worker(Thread):
    def __init__(self):
        Thread.__init__(self)

    def run(self):
        self.generated = [`round(time.clock(), 3)` + hex(hash([]))
                          for i in range(N)]

threads = []
for i in range(NTHREADS):
    threads.append(Worker())

for t in threads:
    t.start()

d = {}
total = 0
for t in threads:
    t.join()
    total += len(t.generated)
    for g in t.generated:
        d[g] = 1

print "saw", len(d), "unique in", total, "total"