[Numpy-discussion] NumPy-Discussion Digest, Vol 161, Issue 14

Inessa Pawson albuscode at gmail.com
Fri Feb 14 17:46:45 EST 2020


Documentation is imperative for project sustainability, yet often
overlooked. Millions of NumPy stakeholders will benefit from this
initiative. Melissa, Mars and Ralf, thank you for taking a lead on this!

On Thu, Feb 13, 2020 at 3:05 AM <numpy-discussion-request at python.org> wrote:

> Send NumPy-Discussion mailing list submissions to
>         numpy-discussion at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>         numpy-discussion-request at python.org
>
> You can reach the person managing the list at
>         numpy-discussion-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
> Today's Topics:
>
>    1. NEP 44 - Restructuring the NumPy Documentation (Melissa Mendon?a)
>
>
>
> ---------- Forwarded message ----------
> From: "Melissa Mendonça" <melissawm at gmail.com>
> To: numpy-discussion at python.org
> Cc:
> Bcc:
> Date: Wed, 12 Feb 2020 10:55:09 -0300
> Subject: [Numpy-discussion] NEP 44 - Restructuring the NumPy Documentation
> Hi all,
>
> Please see the NEP below for a proposal to restructure the documentation
> of NumPy. The main goal here is to make the documentation more visible and
> organized, and also make contributions easier.
>
> Comments and feedback are welcome!
>
>
> See https://github.com/numpy/numpy/pull/15554 for details.
>
> Best,
>
> Melissa
>
> ----
>
> NEP 44 — Restructuring the NumPy Documentation
>
> Authors: Ralf Gommers, Melissa Mendonça, Mars Lee
> Status: Draft
> Type: Process
> Created: 2020-02-11
>
> Abstract
> ======
>
> This document proposes a restructuring of the NumPy Documentation, both in
> form and content, with the goal of making it more organized and
> discoverable for beginners and experienced users.
>
> Motivation and Scope
> =================
>
> See [here](numpy.org/devdocs) for the front page of the latest docs. The
> organization is quite confusing and illogical (e.g. user and developer docs
> are mixed). We propose the following:
>
> - Reorganizing the docs into the four categories mentioned in [1];
> - Creating dedicated sections for Tutorials and How-Tos, including
> orientation on how to create new content;
> - Adding an Explanations section for key concepts and techniques that
> require deeper descriptions, some of which will be rearranged from the
> Reference Guide.
>
> Usage and Impact
> ==============
>
> The documentation is a fundamental part of any software project,
> especially open source projects. In the case of NumPy, many beginners might
> feel demotivated by the current structure of the documentation, since it is
> difficult to discover what to learn (unless the user has a clear view of
> what to look for in the Reference docs, which is not always the case).
>
> Looking at the results of a “NumPy Tutorial” search on any search engine
> also gives an idea of the demand for this kind of content. Having official
> high-level documentation written using up-to-date content and techniques
> will certainly mean more users (and developers/contributors) are involved
> in the NumPy community.
>
> Backward compatibility
> ==================
>
> The restructuring will effectively demand a complete rewrite of links and
> some of the current content. Input from the community will be useful for
> identifying key links and pages that should not be broken.
>
> Detailed description
> ===============
>
> As discussed in the article [1], there are four categories of doc content:
> - Tutorials
> - How-to guides
> - Explanations
> - Reference guide
>
> We propose to use those categories as the ones we use (for writing and
> reviewing) whenever we add a new documentation section.
>
> The reasoning for this is that it is clearer both for
> developers/documentation writers and to users where each information should
> go, and the scope and tone of each document. For example, if explanations
> are mixed with basic tutorials, beginners might be overwhelmed and
> alienated. On the other hand, if the reference guide contains basic
> how-tos, it might be difficult for experienced users to find the
> information they need, quickly.
>
> Currently, there are many blogs and tutorials on the internet about NumPy
> or using NumPy. One of the issues with this is that if users search for
> this information and end up in an outdated (unofficial) tutorial before
> they find the current official documentation, they end up creating content
> that is confusing, especially for beginners. Having a better infrastructure
> for the documentation also aims to solve this problem by giving users
> high-level, up-to-date official documentation that can be easily updated.
>
> Status and ideas of each type of doc content
> ------------------------------------------------------------
>
> * Reference guide
>
> NumPy has a quite complete reference guide. All functions are documented,
> most have examples, and most are cross-linked well with See Also sections.
> Further improving the reference guide is incremental work that can be done
> (and is being done) by many people. There are, however, many explanations
> in the reference guide. These can be moved to a more dedicated Explanations
> section on the docs.
>
> * How-to guides
>
> NumPy does not have many how-to’s. The subclassing and array ducktyping
> section may be an example of a how-to. Others that could be added are:
> - Parallelization (controlling BLAS multithreading with threadpoolctl,
> using multiprocessing, random number generation, etc.)
> - Storing and loading data (.npy/.npz format, text formats, Zarr, HDF5,
> Bloscpack, etc.)
> - Performance (memory layout, profiling, use with Numba, Cython, or
> Pythran)
> - Writing generic code that works with NumPy, Dask, CuPy, pydata/sparse,
> etc.
>
> * Explanations
>
> There is a reasonable amount of content on fundamental NumPy concepts such
> as indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could
> be organized better and clarified to ensure it’s really about explaining
> the concepts and not mixed with tutorial or how-to like content.
>
> There are few explanations about anything other than those fundamental
> NumPy concepts.
>
> Some examples of concepts that could be expanded:
> - Copies vs. Views;
> - BLAS and other linear algebra libraries;
> - Fancy indexing.
>
> In addition, there are many explanations in the Reference Guide, which
> should be moved to this new dedicated Explanations section.
>
> * Tutorials
>
> There’s a lot of scope for writing better tutorials. We have a new NumPy
> for absolute beginners tutorial [3] (GSoD project of Anne Bonner). In
> addition we need a number of tutorials addressing different levels of
> experience with Python and NumPy. This could be done using engaging data
> sets, ideas or stories. For example, curve fitting with polynomials and
> functions in numpy.linalg could be done with the Keeling curve (decades
> worth of CO2 concentration in air measurements) rather than with synthetic
> random data.
>
> Ideas for tutorials (these capture the types of things that make sense,
> they’re not necessarily the exact topics we propose to implement):
> - Conway’s game of life with only NumPy (note: already in Nicolas
> Rougier’s book)
> - Using masked arrays to deal with missing data in time series measurements
> - Using Fourier transforms to analyze the Keeling curve data, and
> extrapolate it.
> - Geospatial data (e.g. lat/lon/time to create maps for every year via a
> stacked array, like gridMet data)
> - Using text data and dtypes (e.g. use speeches from different people,
> shape (n_speech, n_sentences, n_words))
>
> The Preparing to Teach document [2] from the Software Carpentry Instructor
> Training materials is a nice summary of how to write effective lesson plans
> (and tutorials would be very similar). In addition to adding new tutorials,
> we also propose a How to write a tutorial document, which would help users
> contribute new high-quality content to the documentation.
>
> Data sets
> -------------
>
> Using interesting data in the NumPy docs requires giving all users access
> to that data, either inside NumPy or in a separate package. The former is
> not the best idea, since it’s hard to do without increasing the size of
> NumPy significantly. Even for SciPy there has so far been no consensus on
> this (see scipy PR 8707 on adding a new scipy.datasets subpackage).
>
> So we’ll aim for a new (pure Python) package, named numpy-datasets or
> scipy-datasets or something similar. That package can take some lessons
> from how, e.g., scikit-learn ships data sets. Small data sets can be
> included in the repo, large data sets can be accessed via a downloader
> class or function.
>
> Related Work
> ===========
>
> Some examples of documentation organization in other projects:
> - Documentation for Jupyter: https://jupyter.org/documentation
> - Documentation for Python: https://docs.python.org/3/
> - Documentation for TensorFlow: https://www.tensorflow.org/learn
>
> These projects make the intended audience for each part of the
> documentation more explicit, as well as previewing some of the content in
> each section.
>
> Implementation
> ============
>
> Besides rewriting the current documentation to some extent, it would be
> ideal to have a technical infrastructure that would allow more
> contributions from the community. For example, if Jupyter Notebooks could
> be submitted as-is as tutorials or How-Tos, this might create more
> contributors and broaden the NumPy community.
>
> Similarly, if people could download some of the documentation in Notebook
> format, this would certainly mean people would use less outdated material
> for learning NumPy.
>
> It would also be interesting if the new structure for the documentation
> makes translations easier.
>
> Currently, the documentation for NumPy can be confusing, especially for
> beginners. Our proposal is to reorganize the docs in the following
> structure:
>
> * For users:
> - Absolute Beginners Tutorial
> - main Tutorials section
> - How To’s for common tasks with NumPy
> - Reference Guide
> - Explanations
> - F2Py Guide
> - Glossary
>
> * For developers/contributors:
> - Contributor’s Guide
> - Building and extending the documentation
> - Benchmarking
> - NumPy Enhancement Proposals
>
> * Meta information
> - Reporting bugs
> - Release Notes
> - About NumPy
> - License
>
> References and Footnotes
> ====================
>
> [1] What nobody tells you about documentation.
> https://www.divio.com/blog/documentation/
> [2] Preparing to Teach (from the Software Carpentry Instructor Training
> materials).
> https://carpentries.github.io/instructor-training/15-lesson-study/index.html
> [3] NumPy for absolute beginners Tutorial by Anne Bonner.
> https://numpy.org/devdocs/user/absolute_beginners.html
>
> Copyright
> ========
>
> This document has been placed in the public domain.
>
> --
> Melissa Weber Mendonça
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

-- 
Every good wish,
*Inessa Pawson*
Executive Director
Albus Code
inessa at albuscode.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200215/bd06d3a4/attachment-0001.html>


More information about the NumPy-Discussion mailing list