How to store "3D" data? (data structure question)

Graham Fawcett graham.fawcett at gmail.com
Wed Jul 20 14:51:56 EDT 2005


Sebastian Bassi wrote:
> On 20 Jul 2005 10:47:50 -0700, Graham  Fawcett <graham.fawcett at gmail.com> wrote:
> > This looks a lot like 2D data (row/column), not 3D. What's the third
> > axis? It looks, too, that you're not really interested in storage, but
> > in analysis...
>
> I think it as 3D like this:
> 1st axis: [MARKER]Name, like TDF1, TDF2.
> 2nd axis: Allele, like 181, 188 and so on.
> 3rd axis: Line: RHA280, RHA801.
>
> I can have a star in MarkerName TDF1, Allele 181 and Line RHA280.
> I can have an empty (o none) in TDF1, Allele 181 and Line RHA801.

Okay. I think what will drive your data-structure question is the way
that you intend to use the data. Conceptually, it will always be 3D, no
matter how you model it, but trying to make a "3D data structure" is
probably not what is most efficient for your application.

If 90% of your searches are of the type, 'does TDF1/181/RHA280 have a
star?' then perhaps a dict using (name,allele,line) as a key makes most
sense:

  d = {('TDF1',181,'RHA280'):'*', ...}
  query = ('TDF1', 181, 'RHA280')
  assert query in d

Really, you don't need '*' as a value for this, just use None if you
like, since all the real useful info is in the keyspace of the dict.

If you're always querying based on line first, then something like my
earlier 'results' dict might make sense:

  d = {'RHA280':[('TDF1',181), ...], ...}
  for name, allele in d['RHA280']:
      if allele == 181:              # or some other "query" within
RHA280
          ...

You get the idea: model the data in the way that makes it most useable
to you, and/or most efficient (if this is a large data set).

But note that by picking a structure like this, you're making it easy
to do certain lookups, but possibly harder (and slower) to do ones you
hadn't thought of yet.

The general solution would be to drop it into a relational database and
use SQL queries. Multidimensional analysis is what relational DBs are
for, after all. A hand-written data structure is almost guaranteed to
be more efficient for a given task, but maybe the flexibility of a
relational db would help serve multiple needs, where a custom structure
may only be suitable for a few applications.

If you're going to roll your own structure, just keep in mind that
dict-lookups are very fast in Python, far more efficient than, e.g.,
checking for membership in a list.

Graham




More information about the Python-list mailing list