[Matplotlib-devel] Units discussion...

Antony Lee antony.lee at berkeley.edu
Thu Feb 8 15:49:35 EST 2018


I apologize for the erroneous statements I have made regarding tests.  I
should, in fact, be well aware of test_units, having had to fight with it
when fixing PR#9774 (see the part modifying axes/_base.py).  However, my
intent (and again, I readily admit I wrote something else and that was
incorrect) was that the test is against our own mocking of a minimal unit
system, rather than an external, actually used one.  In other words, I
would much prefer actually bringing in pint as a test dependency (at least
for CI -- we can always locally skipif it), and whatever else we need to
cover all cases.  Why?  Because, for someone who is not a unit specialist,
how do I know whether your mock unit class is actually relevant and has
anything to do with "real-life" units?

There may not be a fundamental structural deficiency in the current
converter setup in itself, but I maintain that the need to add an ad hoc
implementation to "each uniquely special snowflake" (plotting method),
rather than at well defined entry points, is less than ideal.  As you
mentioned, this may not actually be due to units, but just to Matplotlib's
general architecture, but unit support make this more visible (... IMO).

Finally, please understand the "lack of well defined use case" from the
point of view of a developer who does not use unitized data.  He sees a
bunch of rather complex code to convert units around (e.g. Line2D.recache
and everything that calls into it), and meanwhile, what is the *only*
documentation he sees on the unit system?  It is the docstring of the units
module, which is frankly less than optimal.  At that point, he just sees
the unit support code as a burden that has to be carried around.
Obviously, I totally understand that people use Matplotlib with different
use cases, and there may be things I use in Matplotlib that you couldn't
care less about.  However, as Jody mentioned some time ago, the unit system
is literally supposed to touch *any* data that comes into Matplotlib, and
can therefore hardly be ignored by any dev.  I believe this is consistent
with the call for a MEP clarifying the use cases of units.

Antony

2018-02-08 21:15 GMT+01:00 Ryan May <rmay31 at gmail.com>:

> Hi,
>
> Let me start by saying that this will probably come across as crabby, and
> I don't really mean for it to do so. I'm happy people are looking at
> improving unit support. HOWEVER, I'm concerned that those trying to push
> right now are completely ignorant of what actually exists in matplotlib and
> how the rest of the ecosystem of unit packages works, don't have personal
> use cases and are completely unclear of what others use cases are, and seem
> to be throwing things at the wall as rapidly as possible. For instance,
> Anthony:
>
> > One major point (already mentioned by others) that led, I think, to
> some devs (including myself) being relatively dismissive about unit support
> is the lack of well-defined use case, other than "it'd be nice if we
> supported units"
> > (i.e., especially from the point of view of devs who *don't* use units
> themselves, it ends up being an ever moving target).  In particular, tests
> on unit support ("unit unit tests"? :-)) currently only rely on the old JPL
> unit code
> > that ended up integrated into Matplotlib's test suite, but does not test
> integration with the two major unit packages I am aware of (pint and
> astropy.units).
>
> False. Until David Stansby's contribution, I wrote every line of code in:
> https://github.com/matplotlib/matplotlib/commits/
> master/lib/matplotlib/tests/test_units.py. Either way, that test has
> literally *nothing* to do with JPL's implementation. (And 30s of github
> could have revealed this.) I added that code *literally* to check whether
> we're properly interfacing with a library just like pint.
>
> > Is there a smaller library that subclasses ndarray for units support?
> I imagine we could vendorize a subset of whatever astropy or yt do.  Or
> maybe they aren’t so huge that they would be unreasonable to make as test
> dependencies.  yt is only 68 Mb.
>
> No. Just no. Again, I have stubbed out just fine the functionality within
> test_units.py to function just like pint--in about 25 lines. I'm happy to
> do so for an ndarray subclass-based one as well.
>
> Now, about the functionality:
>
> > What we need an example of is how the following should work.
> > ```python
> > x = np.arange(10)
> > y  = x*2 * myunitclass.in
> > ax.plot(x, y)
> > z = x*2 * myunitclass.cm
> > ax.plot(x, z)
> > ```
>
> That currently works today, and works just fine. Same test file:
> https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/tests/
> test_units.py#L72-L81 You're plotting things with the same
> dimensionality, the converter interface can convert to the units that exist
> already on the axes. Done. I'm quite happy with it.
>
> Honestly, I'm not trying to be mean about this. But I come into an email
> thread where things are moving so fast, with factually incorrect
> information flying around, that I'm simply overwhelmed. (14 messages in 3
> hours???) I don't think email is a good place to discuss this.
>
> To be clear, I am *ecstatic* that people are looking at unit challenges,
> and I agree that the way we're implementing it in matplotlib is
> hacky--handling it uniquely for each plotting method rather than
> systematically. And I'm happy to have new voices come in and try to improve
> the situation with new ideas. But I see people railing against the current
> converter interface as if it's unused, crusty, or otherwise completely
> inadequate. The converter WORKS fine. The problem is in that we have to
> hook up unit machinery individually to each plotting method, because each
> plotting method is its own special snowflake--unique and unlike any other.
> What we need is to rationalize the implementation of plots, specifically
> the data handling (missing data, units, shape, etc.), and then implementing
> units will be a sane task.
>
> Or maybe I'm wrong, and there is some structural deficiency in the current
> converter--but I'd at least like to see those arguments coming from a place
> knowledge, not conjecture about how this thing may or may not be working
> currently, and wild speculation about how it's supposed to work. Contrary
> to the "lack of well-defined use case" idea, there are plenty--they might
> not be written down, but that doesn't mean they haven't been discussed
> before.
>
> Let's find a better venue for this discussion that lends itself for
> everyone to join in *together*, synchronously, and in a form where we're
> not guessing at tone.
>
> Ryan
>
> On Thu, Feb 8, 2018 at 11:48 AM, Jody Klymak <jklymak at uvic.ca> wrote:
>
>>
>>
>> On 8 Feb 2018, at 09:54, Drain, Theodore R (392P) <
>> theodore.r.drain at jpl.nasa.gov> wrote:
>>
>> I think we can help with building a better toy unit system.  Or we can
>> standardize on datetime and some existing unit package.  Whatever makes it
>> easier for people to write test cases.
>>
>>
>> For me, the problem w/ datetime is that it is not fully featured units
>> handling in that it doesn’t support multiple units.  Its really just a
>> class of data that we have known conversion to float for.
>>
>> What we need an example of is how the following should work.
>>
>> ```python
>> x = np.arange(10)
>> y = x*2 * myunitclass.in
>> ax.plot(x, y)
>> z = x*2 * myunitclass.cm
>> ax.plot(x, z)
>>
>> ```
>>
>> So when a new feature is added, we can ask that its units support is made
>> clear.  I guess I don’t mind if those are astropy units or yt units, or
>> pint, or?? though there will be some pushback about including another test
>> dependency.
>>
>> Would pint units work?  Its a very small dependency, but maybe not as
>> full featured or structured wildly differently from the others?
>>
>> A test suite to my mind would
>>  - test basic functionality
>>  - test mixing allowed dimensions (i.e. inches and centimeters)
>>  - test changing the axis units (so all the plotted data changes its
>> values, *or* the tick locators/formatters change their values).
>>  - test that disallowed mixed dimensions fail.
>>  - ??
>>
>> Cheers,  Jody
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________________
>> From: Jody Klymak <jklymak at uvic.ca>
>> Sent: Thursday, February 8, 2018 9:39 AM
>> To: Drain, Theodore R (392P)
>> Cc: matplotlib development list
>> Subject: Re: [Matplotlib-devel] Units discussion...
>>
>> I realize that units are "a pain", but they're hugely useful.  Just
>> plotting datetimes is going to be a pain without units (and was a huge pain
>> before the unit system).  The proposal that only Axes supports units is
>> going to cause us a massive problem as that's rarely everything that we do
>> with a plot.  I could do a survey to find all the interactions we use (and
>> that doesn't even touch the 1000's of lines of code our users have written)
>> if that would help but anything that's part of the public api (axes,
>> artists, patches, etc) is probably being used - i.e. pretty much anything
>> that's in the current user's guide is something that we use/want/need to
>> work with unitized data.
>>
>> OK, *for discussion*: A scope of work for JPL and Matplotlib might be:
>>
>> 1) develop better toy unit module that has most of the desired features
>> (maybe the existing one is fine, but please see
>> https://github.com/matplotlib/matplotlib/issues/9713 for why I’m a
>> little dismayed with the state of things).
>>
>> 2) write a developer’s guide explaining how units should be/are
>> implemented
>> a) in matplotlib modules
>>        b) by downstream developers (this is probably adequate already).
>>
>> It sounds like what you are saying is that units should be carried to the
>> draw stage (or cache stage) for all artists.  Thats maybe fine, but as a
>> new developer, I found the units support woefully under-documented. The
>> fact that others have hacked in units support in various inconsistent ways
>> means that we need to police all this better.
>>
>> OTOH, maybe Antony and I are poor people to lead this charge, given that
>> we don’t need unit support.  But I don’t think we are being hypercritical
>> in pointing out it needs work.
>>
>> Thanks a lot,   Jody
>>
>>
>> This is kind of what I meant in my previous email about use cases. Saying
>> "just Axes has units" is basically saying the only valid unit use case is
>> create a plot one time and look at it.  You can't manipulate it, edit it,
>> or build any kind of plotting GUI application (which we have many of) once
>> the plot has been created.  The Artist classes are one of the primary API's
>> for applications.  Artists are created, edited, and manipulated if you want
>> to allow the user to modify things in a plot after it's created.    Even
>> the most basic cases like calling Line2D.set_data() wouldn't be allowed
>> with units if only Axes has unit support.
>>
>> I'm not sure I understand the statement that units are a moving target.
>> The reason it keeps popping up is that code gets added without something
>> considering units which then triggers a bug reports which require fixing.
>> If there was a clearer policy and new code was required to have test cases
>> that cover non-unit and unit inputs, I think things would go much
>> smoother.  We'd be happy to help with submitting new test cases to cover
>> unit cases in existing code once a policy is decided on.  Maybe what's
>> needed is better documentation for developers who don't use units so they
>> can easily write a test case with units when adding/modifying functionality.
>>
>> Ted
>>
>> ________________________________________
>> From: anntzer.lee at gmail.com<mailto:anntzer.lee at gmail.com
>> <anntzer.lee at gmail.com>> <anntzer.lee at gmail.com<mailto:
>> anntzer.lee at gmail.com <anntzer.lee at gmail.com>>> on behalf of Antony Lee <
>> antony.lee at berkeley.edu<mailto:antony.lee at berkeley.edu
>> <antony.lee at berkeley.edu>>>
>> Sent: Thursday, February 8, 2018 8:09 AM
>> To: Drain, Theodore R (392P)
>> Cc: matplotlib development list
>> Subject: Re: [Matplotlib-devel] Units discussion...
>>
>> I'm momentarily a bit away from Matplotlib development due to real life
>> piling up, so I'll just keep this short.
>>
>> One major point (already mentioned by others) that led, I think, to some
>> devs (including myself) being relatively dismissive about unit support is
>> the lack of well-defined use case, other than "it'd be nice if we supported
>> units" (i.e., especially from the point of view of devs who *don't* use
>> units themselves, it ends up being an ever moving target). In particular,
>> tests on unit support ("unit unit tests"? :-)) currently only rely on the
>> old JPL unit code that ended up integrated into Matplotlib's test suite,
>> but does not test integration with the two major unit packages I am aware
>> of (pint and astropy.units).
>>
>> From the email of Ted it appears that these are not sufficient to
>> represent all kinds of relevant units.  In particular, I was at some point
>> hoping to completely work in deunitized data internally, *including the
>> plotting*, and rely on the fact that if the deunitized and the unitized
>> data are usually linked by an affine transform, so the plotting part
>> doesn't need to convert back to unitized data and we only need to place and
>> label the ticks accordingly; however Ted mentioned relativistic units,
>> which imply the use of a non-affine transform.  So I think it would also be
>> really helpful if JPL could release some reasonably documented unit library
>> with their actual use cases (and how it differs from pint & astropy.units),
>> so that we know better what is actually needed (I believe carrying the JPL
>> unit code in our own code base is a mistake).
>>
>>
>> As for the public vs private, or rather unitized vs deunitized API
>> discussion, I believe a relatively simple and consistent line would be to
>> make Axes methods unitized and everything else deunitized (but with clear
>> ways to convert to and from unitized data when not using Axes methods).
>>
>> Antony
>>
>> 2018-02-07 16:33 GMT+01:00 Drain, Theodore R (392P) <
>> theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov
>> <theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
>> <theodore.r.drain at jpl.nasa.gov>>>:
>> That sounds fine to me.  Our original unit prototype API actually had
>> conversions for both directions but I think the float->unit version was
>> removed (or really moved) when the ticker/formatter portion of the unit API
>> was settled on.
>>
>> Using floats/numpy arrays internally is going to easier and faster so I
>> think that's a plus.  The biggest issue we're going to run in to is what's
>> defined as "internal" vs part of the unit API.  Some things are easy like
>> the Axes/Axis API.  But we also use low level API's like the patches.  Are
>> those unitized?  This is the pro and con of using something like Python
>> where basically everything is public.  It makes it possible to do lots of
>> things, but it's much harder to define a clear library with a specific
>> public API.
>>
>> Somewhere in the process we should write a proposal that outlines which
>> classes/methods are part of the unit api and which are going to be
>> considered internal.  I'm sure we can help with that effort.
>>
>> That also might help clarify/influence code structure - if internal
>> implementation classes are placed in a sub-package inside MPL 3.0, it
>> becomes clearer to people later on what the "official' public API vs what
>> can be optimized to just use floats.  Obviously the dev's would need to
>> decide if that kind of restructuring is worth it or not.
>>
>> Ted
>>
>> ________________________________________
>> From: David Stansby <dstansby at gmail.com<mailto:dstansby at gmail.com
>> <dstansby at gmail.com>><mailto:dstansby at gmail.com <dstansby at gmail.com>>>
>> Sent: Wednesday, February 7, 2018 3:42 AM
>> To: Jody Klymak
>> Cc: Drain, Theodore R (392P); matplotlib development list
>> Subject: Re: [Matplotlib-devel] Units discussion...
>>
>> Practically, I think what we are proposing is that for unit support the
>> user must supply two functions for each axis:
>>
>> *   A mapping from your unit objects to floating point numbers
>> *   A mapping from those floats back to your unit objects
>>
>> As far as I know function 2 is new, and doesn't need to be supplied at
>> the moment. Doing this would mean we can convert units as soon as they
>> enter Matplotlib, only ever have to deal with floating point numbers
>> internally, and then use the second function as late as possible when the
>> user requests stuff like e.g. the axis limits.
>>
>> Also worth noting that any major change like this will go in to
>> Matplotlib 3.0 at the earliest, so will be python 3 only.
>>
>> David
>>
>> On 7 February 2018 at 06:06, Jody Klymak <jklymak at uvic.ca<mailto:jklyma
>> k at uvic.ca <jklymak at uvic.ca>><mailto:jklymak at uvic.ca <jklymak at uvic.ca>><
>> mailto:jklymak at uvic.ca <jklymak at uvic.ca><mailto:jklymak at uvic.ca
>> <jklymak at uvic.ca>>>> wrote:
>> Dear Ted,
>>
>> Thanks so much for engaging on this.
>>
>> Don’t worry, nothing at all is changing w/o substantial back and forth,
>> and OK from downstream users.   I actually don’t think it’ll be a huge
>> change, probably just some clean up and better documentation.
>>
>> FWIW, I’ve not personally done much programming w/ units, just been a bit
>> perplexed by their inconsistent and (to my simple mind) convoluted
>> application in the codebase.  Having experience from people who try to use
>> them everyday will be absolutely key.
>>
>> Cheers,   Jody
>>
>> On Feb 6, 2018, at  14:17 PM, Drain, Theodore R (392P) <
>> theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov
>> <theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
>> <theodore.r.drain at jpl.nasa.gov>><mailto:theodore.r.drain at jpl.nasa.gov
>> <theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov
>> <theodore.r.drain at jpl.nasa.gov>>>> wrote:
>>
>> We use units for everything in our system (in fact, we funded John Hunter
>> originally to add in a unit system so we could use MPL) so it's a crucial
>> system for us.  In our system, we have our own time classes (which handle
>> relativistic time frames as well as much higher precision representations)
>> and a custom unit system for floating point values.
>>
>> I think it's important to talk about these changes in concrete terms.  I
>> understand the words you're using,  but I'm not really clear on what the
>> real proposed changes are.  For example, the current unit API returns a
>> units.AxisInfo object so the converter can set the formatter and locators
>> to use.  Is that what you mean in the 2nd paragraph about ticks and
>> labels?  Or is that changing?
>>
>> The current unit api is pretty simple and in units.ConversionInterface.
>> Are any of these changes going to change the conversion API?  (note - I'm
>> not against changing it - I'm just not sure if there are any changes or
>> not).
>>
>> Another thing to consider:  many of the examples people use are scripts
>> which make a plot and stop.  But there are other use cases which are more
>> complicated and stress the system in different ways.  We write several GUI
>> applications (in PyQt) that use MPL for plotting.  In these cases, the user
>> is interacting with the plot to add and remove artists, change styles,
>> modify data, etc etc.  So having a good object oriented API for modifying
>> things after construction is important for this to work.  So when units are
>> involved, it can't be a "convert once at construction" and never touch
>> units again.   We are constantly adjusting limits, moving artists, etc in
>> unitized space after the plot is created.
>>
>> So in addition to the ConversionInterface API, I think there are other
>> items that would be useful to explicitly spelled out.  Things like which
>> API's in MPL should accept units and which won't and which methods return
>> unitized data and which don't.   It would be nice if there was a clear
>> policy on this.  Maybe one exists and I'm not aware of it - it would be
>> helpful to repeat it in a discussion on changing the unit system.
>> Obviously I would love to have every method accept and return unitized data
>> :-).
>>
>> I bring this up because I was just working on a hover/annotation class
>> that needed to move a single annotation artist with the mouse.  To move the
>> annotation box the way I needed to, I had to set to one private member
>> variable, call two set methods, use attribute assignment for one value, and
>> set one semi-public member variable - some of which work with units and
>> some of which didn't.  I think having a clear "this kind of method
>> accepts/returns units" policy would help when people are adding new
>> accessors/methods/variables to make it more clear what kind of data is
>> acceptable in each.
>>
>> Ted
>> ps: I may be able to help with some resources to work on any unit
>> upgrades, but to make that happen I need to get a clear statement of what
>> problem is being solved and the scope of the work so I can explain to our
>> management why it's important.
>>
>> ________________________________________
>> From: Matplotlib-devel <matplotlib-devel-bounces+ted.
>> drain=jpl.nasa.gov at python.org<mailto:matplotlib-devel-bounce
>> s+ted.drain=jpl.nasa.gov at python.org
>> <matplotlib-devel-bounces+ted.drain=jpl.nasa.gov at python.org>><
>> mailto:jpl.nasa.gov at python.org <jpl.nasa.gov at python.org>><mailto:jpl.
>> nasa.gov at python.org <jpl.nasa.gov at python.org><mailto:
>> jpl.nasa.gov at python.org <jpl.nasa.gov at python.org>>>> on behalf of Jody
>> Klymak <jklymak at uvic.ca<mailto:jklymak at uvic.ca <jklymak at uvic.ca>><
>> mailto:jklymak at uvic.ca <jklymak at uvic.ca>><mailto:jklymak at uvic.ca
>> <jklymak at uvic.ca><mailto:jklymak at uvic.ca <jklymak at uvic.ca>>>>
>> Sent: Saturday, February 3, 2018 9:25 PM
>> To: matplotlib development list
>> Subject: [Matplotlib-devel] Units discussion...
>>
>> Hi all,
>>
>> To carry on the gitter discussion about unit handling, hopefully to lead
>> to a more stringent documentation and implimentation….
>>
>> In response to @anntzer I thought about the units support a bit - it
>> seems that rather than a transform, a more straightforward approach is to
>> have the converter map to float arrays in a unique way.  This float mapping
>> would be completely analogous to `date2num` in `dates`, in that it doesn’t
>> change and is perfectly invertible without matplotlib ever knowing about
>> the unit information, though the axis could store it for the the tick
>> locators and formatters.  It would also have an inverse that would supply
>> data back to the user in unit-aware data (though not necessarily in the
>> unit that the user supplied. e.g. if they supply 8*in, the and the
>> converter converts everything to meter floats, then the returned unitized
>> inverse would be 0.203*m, or whatever convention the converter wants to
>> supply.).
>>
>> User “unit” control, i.e. making the plot in inches instead of m, would
>> be accomplished with ticks locators and formatters.  Matplotlib would never
>> directly convert between cm and inches (any more than it converts from days
>> to hours for dates), the downstream-supplied tick formatter and labeller
>> would do it.
>>
>> Each axis would only get one converter, set by the first call to the
>> axis. Subsequent calls to the axis would pass all data (including bare
>> floats) to the converter.  If the converter wants to pass bare floats then
>> it can do so.  If it wants to accept other data types then it can do so.
>> It should be possible for the user to clear or set the converter, but then
>> they should know what they are doing and why.
>>
>> Whats missing?  I don’t think this is wildly different than what we have,
>> but maybe a bit more clear.
>>
>> Cheers,   Jody
>>
>>
>>
>>
>> _______________________________________________
>> Matplotlib-devel mailing list
>> Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org>>>
>> https://mail.python.org/mailman/listinfo/matplotlib-devel
>> _______________________________________________
>> Matplotlib-devel mailing list
>> Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org>><mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org><mailto:Matplotlib-devel at python.org
>> <Matplotlib-devel at python.org>
>>
>> ...
>
> [Message tronqué]
> _______________________________________________
> Matplotlib-devel mailing list
> Matplotlib-devel at python.org
> https://mail.python.org/mailman/listinfo/matplotlib-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/matplotlib-devel/attachments/20180208/57631f66/attachment-0001.html>


More information about the Matplotlib-devel mailing list