[Pandas-dev] Pandas lite

Sat Nov 18 03:21:26 EST 2017

Hi!

I'm going to teach an introduction to pandas to Python newbies, and I'm 
looking for ways to simplify the the view to the API and/or avoid some 
of the pitfalls.

I'd like to identify a minimal set of methods/operations, which are 
enough for performing most common tasks with simply-indexed data 
(importing/exporting from csv/Excel, selecting rows and columns by 
index, boolean indexing of the rows, creating new columns, simple 
group-by and aggregations, simple plotting, maybe also simple joins) and 
which have minimal potential for surprises (unexpected copies, 
unexpected views, confusing warnings, differences with indexing with 
lists vs tuples etc). Maybe even allowing only "pure" transformations a 
la relational algebra? We could call it an opinionated and restricted 
usage-scheme of pandas.

The students would use this subset of the API until they gain enough 
experience to meet the hairier face of pandas.

Has anybody tried marking a subset of pandas API for some reasons?

I was also thinking about how to enforce the boundaries of this subset:

  * Just suggest students to stick with it.
  * Provide a static analysis which disallows (or warns against) the
    operations/tricks outside the boundaries.
  * a wrapper library (eg. import pandaslite as pd) which wraps required
    pandas classes into similar classes which publish only a subset of
    the pandas capabilities and perform some extra checks (eg. disallow
    duplicates in the index). When the students grow tough enough or
    need more power, they would simply replace "import pandaslite as pd"
    with "import pandas as pd" in their code.

At the moment I'm considering experimenting with the third approach.

I'd be glad to hear your comments!

best regards,
Aivar Annamaa

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20171118/46d4e44a/attachment.html>