[Python-ideas] RFC: Multiple Dispatch

Matthew Rocklin mrocklin at gmail.com
Sat Aug 16 15:58:43 CEST 2014


Here is a non-trivial example of multiple dispatch.  I want to convert data
between container types, i.e.  given

into(a, b)

I want to return something with the information content of b in a container
like a, e.g.

In [24]: into([], (1, 2, 3))
Out[24]: [1, 2, 3]

We use this abstraction pretty heavily in Blaze, a project that tries to
map relational algebra onto a variety of projects that might possibly be
used to do relational-algebra-like tasks.  Projects in this scope include
sqlalchemy, pandas, numpy, pyspark, pytables, etc..

In [26]: from blaze import into

A dataframe with some test data

In [25]: df = DataFrame([[1, 'Alice',   100],
                         [2, 'Bob',    -200],
                         [3, 'Charlie', 300],
                         [4, 'Dennis',   400],
                         [5, 'Edith',  -500]],
                         columns=['id', 'name', 'amount'])

migrate list <- DataFrame

In [27]: into([], df)
Out[27]:
[[1, 'Alice', 100],
 [2, 'Bob', -200],
 [3, 'Charlie', 300],
 [4, 'Dennis', 400],
 [5, 'Edith', -500]]

migrate numpy array <- DataFrame

In [28]: into(np.ndarray(0), df)
Out[28]:
rec.array([(1, 'Alice', 100), (2, 'Bob', -200), (3, 'Charlie', 300),
       (4, 'Dennis', 400), (5, 'Edith', -500)],
      dtype=[('id', '<i8'), ('name', 'O'), ('amount', '<i8')])

In [29]: x = into(np.ndarray(0), df)  # store for later


connect to local pymongo database

In [30]: import pymongo

In [31]: db = pymongo.MongoClient().db

In [34]: into(db.my_collection, df)  # migrate mongo <- pandas
Out[34]: Collection(Database(MongoClient('localhost', 27017), u'db'),
u'my_collection')

In [35]: into(db.my_collection2, x)  # migrate mongo <- numpy
Out[35]: Collection(Database(MongoClient('localhost', 27017), u'db'),
u'my_collection2')

In [36]: list(db.my_collection2.find())  # verify that things transferred
well
Out[36]:
[{u'_id': ObjectId('53ef6167fb5d1b34b9fd00e2'),
  u'amount': 100,
  u'id': 1,
  u'name': u'Alice'},
 {u'_id': ObjectId('53ef6167fb5d1b34b9fd00e3'),
  u'amount': -200,
  u'id': 2,
  u'name': u'Bob'},
 {u'_id': ObjectId('53ef6167fb5d1b34b9fd00e4'),
  u'amount': 300,
  u'id': 3,
  u'name': u'Charlie'},
 {u'_id': ObjectId('53ef6167fb5d1b34b9fd00e5'),
  u'amount': 400,
  u'id': 4,
  u'name': u'Dennis'},
 {u'_id': ObjectId('53ef6167fb5d1b34b9fd00e6'),
  u'amount': -500,
  u'id': 5,
  u'name': u'Edith'}]

migrate bcolz <- mongo

In [37]: into(bcolz.ctable(), db.my_collection)
Out[37]:
ctable((5,), [('amount', '<i8'), ('id', '<i8'), ('name', '<U7')])
  nbytes: 220; cbytes: 63.99 KB; ratio: 0.00
  cparams := cparams(clevel=5, shuffle=True, cname='blosclz')
[(100, 1, u'Alice') (-200, 2, u'Bob') (300, 3, u'Charlie')
 (400, 4, u'Dennis') (-500, 5, u'Edith')]

Note in this last case that the two libraries, bcolz (a compressed on-disk
storage library) and pymongo know absolutely nothing about each other.

Many of these into definitions are very simple

@dispatch(np.ndarray, DataFrame)
def into(a, df):
    return df.to_records(index=False)

While some of them rely on others, or on inheritance

@dispatch(Collection, np.ndarray)
def into(coll, x, **kwargs):
    return into(coll, into(DataFrame(), x), **kwargs)


But remembering all of the appropriate .to_foo and .from_bar methods can be
a real pain.  Collecting them all into a single abstraction cuts down
significantly on the administrative burden of data migrations.



On Sat, Aug 16, 2014 at 6:34 AM, Antoine Pitrou <antoine at python.org> wrote:

> Le 15/08/2014 14:01, Guido van Rossum a écrit :
>
>  Please do write about non-toy examples!
>>
>
> Are you looking for examples using the multipledispatch library, or
> multiple dispatch in general?
>
> As for multiple dispatch in general, Numba uses something which is morally
> one in order to select the right specialization of, say, an operator (for
> example to choose amongst '+ between int and int', '+ between
> numpy.datetime64 and numpy.timedelta64', '+ between numpy.timedelta64 and
> numpy.timedelta64', etc.).
>
> Regards
>
> Antoine.
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140816/9683a5fd/attachment.html>


More information about the Python-ideas mailing list