[IPython-dev] Pandawash: extension to conveniently & transparently clean up data

Thomas Kluyver takowl at gmail.com
Mon Apr 21 12:30:45 EDT 2014


The result of a quick bit of hacking yesterday, pandawash is an IPython
extension to help clean up messy data in pandas dataframes.

The key feature is that it generates plain Python code which you modify to
do the data cleanup. For instance, you can use it to check that the values
in a numeric column are within a specified range. If any values are outside
that, it will create a new cell with the necessary code to replace them;
you just set the replacement values and run the cell. This is more
convenient than finding those values and writing the code yourself, but it
leaves you with full control and a clear record of the changes, unlike more
automatic data cleaning.

Demo:
http://nbviewer.ipython.org/github/takluyver/pandawash/blob/master/Pandawash%20Demo.ipynb

Source code:
https://github.com/takluyver/pandawash

Thanks,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140421/d3575402/attachment.html>


More information about the IPython-dev mailing list