Data mapper - need to map an dictionary of values to a model

Tue Jan 15 18:53:51 EST 2008

On Jan 14, 7:56 pm, Luke <Luke.Visin... at gmail.com> wrote:

> I am writing an order management console. I need to create an import
> system that is easy to extend. For now, I want to accept an dictionary
> of values and map them to my data model. The thing is, I need to do
> things to certain columns:
>
> - I need to filter some of the values (data comes in as YYYY-MM-
> DDTHH:MM:SS-(TIMEZONE-OFFSET) and it needs to map to Order.date as a
> YYYY-MM-DD field)
> - I need to map parts of an input column to more than one model param
> (for instance if I get a full name for input--like "John Smith"--I
> need a function to break it apart and map it to
> Order.shipping_first_name and Order.shipping_last_name)
> - Sometimes I need to do it the other way too... I need to map
> multiple input columns to one model param (If I get a shipping fee, a
> shipping tax, and a shipping discount, I need them added together and
> mapped to Order.shipping_fee)
>
> I have begun this process, but I'm finding it difficult to come up
> with a good system that is extensible and easy to understand. I won't
> always be the one writing the importers, so I'd like it to be pretty
> straight-forward. Any ideas?
>
> Oh, I should also mention that many times the data will map to several
> different models. For instance, the importer I'm writing first would
> map to 3 different models (Order, OrderItem, and OrderCharge)
>
> I am not looking for anybody to write any code for me. I'm simply
> asking for inspiration. What design patterns would you use here? Why?

The specific transformations you describe are simple to be coded
directly but unless you constrain the set of possible transformations
that can take place, I don't see how can this be generalized in any
useful way. It just seems too open-ended.

The only pattern I can see here is breaking down the overall
transformation to independent steps, just like the three you
described. Given some way to specify each separate transformation,
their combination can be factored out. To illustrate, here's a trivial
example (with dicts for both input and output):

class MultiTransformer(object):
    def __init__(self, *tranformers):
        self._tranformers = tranformers

    def __call__(self, input):
        output = {}
        for t in self._tranformers:
            output.update(t(input))
        return output

date_tranformer = lambda input: {'date' : input['date'][:10]}
name_tranformer = lambda input: dict(
                           zip(('first_name', 'last_name'),
                           input['name']))
fee_tranformer = lambda input: {'fee' : sum([input['fee'],
                                             input['tax'],
                                             input['discount']])}
tranformer = MultiTransformer(date_tranformer,
                              name_tranformer,
                              fee_tranformer)
print tranformer(dict(date='2007-12-22 03:18:99-EST',
                      name='John Smith',
                      fee=30450.99,
                      tax=459.15,
                      discount=985))
# output
#{'date': '2007-12-22', 'fee': 31895.140000000003,
  'first_name': #'J', 'last_name': 'o'}

You can see that the MultiTransformer doesn't buy you much by itself;
it just allows dividing the overall task to smaller bits that can be
documented, tested and reused separately. For anything more
sophisticated, you have to constrain what are the possible
transformations that can happen. I did something similar for
transforming CSV input rows (http://pypi.python.org/pypi/csvutils/) so
that it's easy to specify 1-to-{0,1} transformations but not 1-to-many
or many-to-1.

HTH,
George