[melbourne-pug] Lists or Arrays - When do I use what and why - Or : Confused much? I am, Now

William ML Leslie william.leslie.ttg at gmail.com
Thu Feb 10 08:19:18 CET 2011


On 7 February 2011 18:08, David Crisp <dcrisp at netspace.net.au> wrote:
> (a) More than one entry per key not allowed. Which means no duplicate key is
> allowed. When duplicate keys encountered during assignment, the last
> assignment wins.
>
> By my reading that is saying that there can only be one X value of 10  if a
> new X value of ten comes along then it will supercede the previous.
>
> Because this is 3D space these points represent,  there can actually be more
> than 1 X of Value 10 and indeed there can be more than 1 Y value of 10 as
> well.

That is why I used a dictionary of lists.  Dictionaries with sets or
lists as the value can be a very useful data structure.  This means
that points[3, 4, 5] is the list of all points just north-west-up of
(3, 4, 5).

If you do have different access patterns, you can build appropriate
indexes; like so:

position_by_x = {}
for x, y, z in points:
    index_by_x.setdefault(x, []).append((x, y, z))


On 8 February 2011 01:06, Ben Dyer <ben.dyer at taguchimail.com> wrote:
> To put some concrete numbers to this, I ran some test scripts on an Amazon m2.4xlarge instance (68GB RAM). Loading 100m records with uniformly distributed coordinate values in the range (-100.0, 100.0) into a 200x200x200 numpy array (with each element of the 3d array containing a 1d numpy array of records) takes 42m13s of CPU time and 3.2GB RAM. Using a similar structure based on Python dicts, lists and tuples (3d sparse matrix using dicts, with each element being a list, containing a tuple per record) has consumed 106m17s of CPU time and 10GB RAM for 42.5m records so far — and it's scaling much, much worse.

Or even a database; depending on what you expect to do with all that
data.  Besides allowing better parallel access, an appropriately
indexed relational database may reduce the amount of paging and TLB
churn you get attempting to use such a large array mapped directly
into memory.

-- 
William Leslie


More information about the melbourne-pug mailing list