[Pandas-dev] Pandas lite
Aivar Annamaa
aivar.annamaa at gmail.com
Sat Nov 18 03:21:26 EST 2017
Hi!
I'm going to teach an introduction to pandas to Python newbies, and I'm
looking for ways to simplify the the view to the API and/or avoid some
of the pitfalls.
I'd like to identify a minimal set of methods/operations, which are
enough for performing most common tasks with simply-indexed data
(importing/exporting from csv/Excel, selecting rows and columns by
index, boolean indexing of the rows, creating new columns, simple
group-by and aggregations, simple plotting, maybe also simple joins) and
which have minimal potential for surprises (unexpected copies,
unexpected views, confusing warnings, differences with indexing with
lists vs tuples etc). Maybe even allowing only "pure" transformations a
la relational algebra? We could call it an opinionated and restricted
usage-scheme of pandas.
The students would use this subset of the API until they gain enough
experience to meet the hairier face of pandas.
Has anybody tried marking a subset of pandas API for some reasons?
I was also thinking about how to enforce the boundaries of this subset:
* Just suggest students to stick with it.
* Provide a static analysis which disallows (or warns against) the
operations/tricks outside the boundaries.
* a wrapper library (eg. import pandaslite as pd) which wraps required
pandas classes into similar classes which publish only a subset of
the pandas capabilities and perform some extra checks (eg. disallow
duplicates in the index). When the students grow tough enough or
need more power, they would simply replace "import pandaslite as pd"
with "import pandas as pd" in their code.
At the moment I'm considering experimenting with the third approach.
I'd be glad to hear your comments!
best regards,
Aivar Annamaa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171118/46d4e44a/attachment.html>
More information about the Pandas-dev
mailing list