[CentralOH] N-dimensional array slicing/crosscutting

jep200404 at columbus.rr.com jep200404 at columbus.rr.com
Thu Dec 8 22:59:22 EST 2016


On Thu, 8 Dec 2016 21:19:23 -0500, Eric Floehr <eric at intellovations.com> wrote:

> I was wondering if anyone knows of any way in Python (functional or not),
> or a module, that would allow the following:
> 
> Let's say I have an n-dimensional array, with keys (numeric or otherwise)
> for each dimension. I'd like to be able to set up and grab the elements in
> any row in any dimension equally easily.

Here are some fast and sloppy crazy thoughts.

Your example idea showed nested dictionaries.
I fear that quickly descending into madness.
Since you want an n-dimensional array,
I think about having an n-dimensional array,
and worrying about the keys separately.
Avoid dictionaries nested n deep.

numpy arrays would constrain the data to all the same type.
I wonder about pandas' tables (or whatever they are called),
since they handle missing data well.

If you want numpy kind of access, but need different kinds of values,
one could store just object numbers which would be keys
to an object store dictionary.
That would be ugly to maintain, but likely fast.
A sentinel object number would tolerate missing values.
You have not said if the ability to cope with missing
values is important.

> 2. I always have to access a cell in the right order, and have to remember
> that order. It would be great if I could access a cell in the example above
> as table['forecast']['occurred'] or table['occurred']['forecast'] (syntax
> doesn't matter, table('forecast', 'occurred') is fine too).

Wow. That's completely crazy.
To support the keys in any order,
I would likely use table('forecast', 'occurred') argument style syntax,
but who knows, maybe with enough __method__ magic,
the table['forecast']['occurred'] syntax could be made to work.
Neverminding the syntax, I can think of a couple of crazy ugly solutions.

1. Use an n-dimensional array
   with dictionaries that go from keys to array indexes.

   If each key happens only in one dimension,
   then you could just look up which dimension it belongs to
   in a dictionary
   Perhaps a dictionary where the keys are the keys you want
   and the values are:
   a. a tuple of 
       the dimension that key is for and
       the index for that key

      If there is more than one key for the same dimension,
      raise an exception. 
      Dimensions for which there is no index are slices.
   b. an n tuple of indexes, with None or 0 for the irrelevant dimensions
      one would look up all the n-tuples for all the keys
      and just add them if 0 is used for irrelevant dimension
      If None is used for irrelevant dimension,
      populate the non None indexes.
      The dimensions for which there is no index,
      would be a slice. 0 for irrelevant dimension can't do that,
      so I'm leaning for None for irrelevant dimensions.
      If two keys want to populate the same dimension,
      then raise exception. 0 for irrelevant dimensions can't do that,
      so again, I'm leaning toward None for irrelevant dimensions.

   numpy has some pretty powerful slicing stuff.
   that might make numpy arrays attractive.
   just put a keys to numerical indexes wrapper around an numpy array

2. Use a dictionary where the keys are tuples of sorted keys.
   (this also makes me think of a big non-sql database like MongoDB)
   When looking something up, sort the keys, then combine them,
   then look up the data in the big dictionary.
   Lookups could be rather fast.
   The slowest part would be the sorting.
   Slices would be painful, requiring one to iterate through
   the whole dictionary. Hmmm, how about applying regular
   expressions to the keys? One would build the pattern
   from the available key values.

Of course, whatever crazy solution we come up with would likely
be implemented as some class with many overloaded __method__s.

> 3. It's hard to slice... in a 2-dimensional array, it's easy to get the
> cells in the outermost dict via table['forecast'].values() but how would I
> just as easily (equivalently) get table['not occurred'].values()?
> 
> Thoughts? Am I missing something obvious?

This sounds like a fun problem.


More information about the CentralOH mailing list