Newbie help with array handling

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Thu Apr 12 04:40:32 EDT 2007


loial:
> I am new to python and am converting an awk script to python

It seems there are many people trying to convert awk code to
Python :-)


> I need to store some data in an array/table of some form
> keyvalue1, value1, value2, value3
> keyvalue2, value1,value2, value3
> keyvalue3, value1,value2,value3
> etc
> I will later need to sort in keyvalue order and also need to be able
> to check if a key already exists

The problem with multiple values is easy to manage, you just put them
into a  list:
[value1, value2, value3]
Such list can be used as the a value for a key:value pair inside a
dict, etc.

The problem is that Python doesn't have a built-in sorted dictionary
data structure. So if you need it there are some solutions:

1) If you don't need to cheek presence often, and you don't need to
remove many key:value pairs, then you can just use a list of lists
like this (a Python list is an array of references dynamic on the
right):

data = [[keyvalue1, value1, value2, value3], [keyvalue2,
value1,value2, value3], [keyvalue3, value1,value2,value3], ...]

Then you can test the presence of a key with something like:
key in (subl[0] for subl in data)

Such list data can be sorted too according to the key (untested), if
the keys are sortable objects:
from operator itemgetter
data.sort(key=itemgetter(0))


2) If you need the dict characteristics a lot, then you may use a dict
(but keys must be hashable objects):
ddata = {keyvalue1:[value1, value2, value3], keyvalue2:[value1,
value2, value3], ...}

This allows quick presence testing, insertions and removals, but it
can't be sorted according to the keys. So you may need to keep a
sorted list of the keys:
skeys = sorted(ddata)
Then you can use skeys as you need, fetching the values from ddata if/
when you need them (note that there is a bisect standard module too
that may help the management of a sorted list).
But you have to keep the two structures updated at the same time, or
re-create new skeys now and then...


3) If you want to be sure the two structures are always the same, then
you may need to find an ordered dict class around (or you can write it
yourself, but it may be too difficult for you), such implementation
usually keep a dict and a list inside, and keep them in sync. You can
probably use it as a normal dict, so its usage is rather clean and
easy.
Most of such implementations have a slow removal and add of items,
because removing/adding them from the list requires time. If you need
to perform many such operations on a lot of data, then there is a way
to do that too, using a different implementation of ordered dict, you
can start from this, that uses a double linked list structure, it's
slow, but such time is costant (note that this keeps the order of the
insertions, it doesn't sort keys according to their natural order, so
you need to modify the code):
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498195

Bye,
bearophile




More information about the Python-list mailing list