[TriPython] Prediction Model. Data Visualization.

Francois Dion francois.dion at gmail.com
Wed Oct 11 12:18:36 EDT 2017


Art (and list members interested in visualization),

As Dave mentioned, donut charts work best for progress to goal. ie. a
percentage. Like a dashboard guage. Or something where the 50% mark is
important, say a win/loss indicator of the Carolina Hurricanes against
visitor. Similarly, the ancestor of donut chart, the pie chart is best
suited for parts of a whole when you have 2 or 3 elements at most. Beyond
that, it is almost impossible to figure out the percentages and relative
importance. Bar charts do much better when there are more than 2 or 3
values.

A confusion matrix, in the simplest binary case, bins 4 possible outcome of
a classifier. True positive (you are part of the class and I said so),
false positive (you are not part of the class but I said you were), true
negative (you are not part of the class and I said so) and false negative
(you are part of the class but I said you were not). The expectation of
representation of a confusion matrix, is unsurprisingly, as a matrix. The
standard way to represent this is in a table format, a matrix (of actual
against predicted), hence the name. This has been the case since at least
the 1950s (without doing an exhaustive search, just from memory). For
example, I just pulled Mike James' "Classification Algorithms" from 1985,
page 83, and there it is. He also sums each row and column.

But, sure, the plain text table is a bit drab if you are looking for
maximum impact. So, that's where I was suggesting a heatmap. Or you can use
the python package yellow brick.

Here's an example using seaborn's heatmap (and making sure I label the
axis, else it is useless). I used cmap="Greens":

https://datasciencefrancois.tumblr.com/post/166291770900/confusion-matrix-with-a-single-color-sequential

I've had no problem using this with technical and non technical audiences.
Shown CMs like the above (and a variety of other graphical and
semigraphical displays) to business folks who then proceeded to green light
further phases of fairly large data science projects. Once they've seen one
and got it you never have to explain it again. Without the heatmap colors,
it was super challenging to have people "get it".

Also, you might be interested in this list of books on visualization (from
my "ex-libris" series on linkedin):

https://www.linkedin.com/pulse/ex-libris-data-scientist-part-v-visualization-francois-dion/

In particular, Stephen Few's "Show Me the Numbers : Designing Tables and
Graphs to Enlighten" should definitely be on everyone's reading list, along
with Cairo's "The Functional Art", will get you started, if you can't
commit to reading 1 viz book per week for the next 2 years :)

Thanks,
Francois



On Wed, Oct 11, 2017 at 8:52 AM, Art <artem.nesterenko at gmail.com> wrote:

>    Donut graph:
>    [1]https://imgur.com/a/C7r8x
>    You should be able to see it now.
>    Art Nestsiarenka
>

-- 
about.me/francois.dion - www.pyptug.org - www.3DFutureTech.info - @f_dion
<http://twitter.com/f_dion>
-------------- next part --------------
   Art (and list members interested in visualization),

   As Dave mentioned, donut charts work best for progress to goal. ie. a
   percentage. Like a dashboard guage. Or something where the 50% mark is
   important, say a win/loss indicator of the Carolina Hurricanes against
   visitor. Similarly, the ancestor of donut chart, the pie chart is best
   suited for parts of a whole when you have 2 or 3 elements at most. Beyond
   that, it is almost impossible to figure out the percentages and relative
   importance. Bar charts do much better when there are more than 2 or 3
   values.

   A confusion matrix, in the simplest binary case, bins 4 possible outcome
   of a classifier. True positive (you are part of the class and I said so),
   false positive (you are not part of the class but I said you were), true
   negative (you are not part of the class and I said so) and false negative
   (you are part of the class but I said you were not). The expectation of
   representation of a confusion matrix, is unsurprisingly, as a matrix. The
   standard way to represent this is in a table format, a matrix (of actual
   against predicted), hence the name. This has been the case since at least
   the 1950s (without doing an exhaustive search, just from memory). For
   example, I just pulled Mike James' "Classification Algorithms" from 1985,
   page 83, and there it is. He also sums each row and column.

   But, sure, the plain text table is a bit drab if you are looking for
   maximum impact. So, that's where I was suggesting a heatmap. Or you can
   use the python package yellow brick.

   Here's an example using seaborn's heatmap (and making sure I label the
   axis, else it is useless). I used cmap="Greens":

   [1]https://datasciencefrancois.tumblr.com/post/166291770900/confusion-matrix-with-a-single-color-sequential

   I've had no problem using this with technical and non technical audiences.
   Shown CMs like the above (and a variety of other graphical and
   semigraphical displays) to business folks who then proceeded to green
   light further phases of fairly large data science projects. Once they've
   seen one and got it you never have to explain it again. Without the
   heatmap colors, it was super challenging to have people "get it".
   Also, you might be interested in this list of books on visualization (from
   my "ex-libris" series on linkedin):

   [2]https://www.linkedin.com/pulse/ex-libris-data-scientist-part-v-visualization-francois-dion/

   In particular, Stephen Few's "Show Me the Numbers : Designing Tables and
   Graphs to Enlighten" should definitely be on everyone's reading list,
   along with Cairo's "The Functional Art", will get you started, if you
   can't commit to reading 1 viz book per week for the next 2 years :)

   Thanks,
   Francois
   On Wed, Oct 11, 2017 at 8:52 AM, Art <[3]artem.nesterenko at gmail.com>
   wrote:

     ** **Donut graph:
     ** **[1][4]https://imgur.com/a/C7r8x
     ** **You should be able to see it now.
     ** **Art Nestsiarenka

   --
   [5]about.me/francois.dion - [6]www.pyptug.org - [7]www.3DFutureTech.info -
   [8]@f_dion

References

   Visible links
   1. https://datasciencefrancois.tumblr.com/post/166291770900/confusion-matrix-with-a-single-color-sequential
   2. https://www.linkedin.com/pulse/ex-libris-data-scientist-part-v-visualization-francois-dion/
   3. mailto:artem.nesterenko at gmail.com
   4. https://imgur.com/a/C7r8x
   5. http://about.me/francois.dion
   6. http://www.pyptug.org/
   7. http://www.3dfuturetech.info/
   8. http://twitter.com/f_dion


More information about the TriZPUG mailing list