Python and Schools

Beni Cherniavsky cben at techunix.technion.ac.il
Fri Apr 18 04:34:14 EDT 2003


In comp.lang.python, you wrote:
>[David Eppstein]
>> ...
>> A simple example would be an algorithm that takes as input a sequence of
>> values that are all (for whatever reason) guaranteed to be integers in
>> range(n**2), and tests whether any two of them are the same:
>>
>> def anysame(seq):
>>     allocate array A, of dimension n**2, WITHOUT INITIALIZING IT
>>         (is this possible in pure python?)
>
>It shouldn't be possible:  if you find a way to do it, it would be
>considered a security bug.  No memory is really uninitialized -- if Python
>gave you read access to n**2 bytes in constant time, it would be revealing
>to you the state left behind by some other function (or user, or process).
>If you created a file with at least n**2 bytes beforehand, then you could
>mmap the file in constant time, and that acts like an array object
>initialized to whatever bytes happen to be sitting in the file.  But the O()
>behavior of access to an mmap'ed region depends on the platform
>implementation of mmap'ed regions.
>
It's impossible in plain portable C but an operating system can do this using
memory mapping hardware.  It's possible to mark the required range of memory
pages to cause an exception and be initialized [to zeros] only when they are
accessed.  Since the algorithm only accesses a few pages, this should work
fine.  This reveals a little clarification to the original claim: the
algorithm doesn't use O(N**2) memory, it eats O(N**2) of your *address space*.

In fact, that's what mmaping /dev/zero probably does on most platforms... 
There is a catch: for large enough N, you need to mark O(N**2) pages in this
way, so it's O(N**2) time to allocate again :-(.  This can be overcome by
hierarchical page tables.  To be precise, you would need an arbitrarily deep
hierarchy rather than one of fixed depth that real computers tend to have
<wink>.  Note that such a hierarchy is actually a way of implementing a
dictionary from digital trees :-).

You would actually need infinite memory to have an unbounded hierarchy.  Which
exposes a small catch with application of asymptotic analysis once you
approach limits of address space, integer size, etc.  No algorithm takes
O(f(N)) when N approaches infinity - instead it just runs out of resources
:-)!  So many algorithms can be pervertly claimed to be O(1) on a given
architecture.  For example one can sort any array of 16 bit integers in O(1)
by just allocating a O(2**16) array counting how many times each number
appeared and then scanning the array.  Of course this is worse than any
sensible algorithm one would write - because of the huge factor.  In truth
this is not O(1) - it's O(M) where M is your address space or "integer space"
size.  But omitting such considerations can lead to funny "paradoxes"...

P.S. Cute algorithm indeed!

Beni Cherniavsky <cben at tx.technion.ac.il>





More information about the Python-list mailing list