[Python-Dev] uuid module - byte order issue

Oren Tirosh oren at hishome.net
Thu Aug 3 22:11:16 CEST 2006


The UUID module uses network byte order, regardless of the platform
byte order. On little-endian platforms like Windows the ".bytes"
property of UUID objects is not compatible with the memory layout of
UUIDs:

>>> import uuid
>>> import pywintypes
>>> s = '{00112233-4455-6677-8899-aabbccddeeff}'
>>> uuid.UUID(s).bytes.encode('hex')
'00112233445566778899aabbccddeeff'
>>> str(buffer(pywintypes.IID(s))).encode('hex')
'33221100554477668899aabbccddeeff'
>>>

Ka-Ping Yee writes* that the Windows UUID generation calls are not RFC
4122 compliant and have an illegal version field. If the correct byte
order is used the UUIDs generated by Windows XP are valid version 4
UUIDs:

>>> parts = struct.unpack('<LHH8s', pywintypes.CreateGuid())
>>> parts[2] >> 12    # version
4
>>> ord(parts[3][0]) & 0xC0    # variant
128

The first three fields (32 bit time-low, 16 bit time-mid and
time-high-and-version) are stored in the platform byte order while the
remainder is stored as a vector of 8 bytes.

The bytes property and bytes argument to the constructor should use
the platform byte order. It would be nice to have explicit little
endian and big endian versions available on platforms of either
endianness for compatibility in communication and disk formats.


There is another issue with version 1 uuid generation:
>>> len(set(uuid.uuid1() for i in range(1000)))
992

The problem is that the random clock_seq field is only 14 bits long.
If enough UUIDs are generated within the same system clock tick there
will be collisions. Suggested solution: use the high-resolution of the
time field (100ns) to generate a monotonically increasing timestamp
that advances at least by 1 for each call, when time.time() returns
the same value on subsequent calls.

  Oren

[*] http://mail.python.org/pipermail/python-dev/2006-June/065869.html


More information about the Python-Dev mailing list