[TriPython] Prediction Model. Data Visualization.
Francois Dion
francois.dion at gmail.com
Wed Oct 11 12:18:36 EDT 2017
Art (and list members interested in visualization),
As Dave mentioned, donut charts work best for progress to goal. ie. a
percentage. Like a dashboard guage. Or something where the 50% mark is
important, say a win/loss indicator of the Carolina Hurricanes against
visitor. Similarly, the ancestor of donut chart, the pie chart is best
suited for parts of a whole when you have 2 or 3 elements at most. Beyond
that, it is almost impossible to figure out the percentages and relative
importance. Bar charts do much better when there are more than 2 or 3
values.
A confusion matrix, in the simplest binary case, bins 4 possible outcome of
a classifier. True positive (you are part of the class and I said so),
false positive (you are not part of the class but I said you were), true
negative (you are not part of the class and I said so) and false negative
(you are part of the class but I said you were not). The expectation of
representation of a confusion matrix, is unsurprisingly, as a matrix. The
standard way to represent this is in a table format, a matrix (of actual
against predicted), hence the name. This has been the case since at least
the 1950s (without doing an exhaustive search, just from memory). For
example, I just pulled Mike James' "Classification Algorithms" from 1985,
page 83, and there it is. He also sums each row and column.
But, sure, the plain text table is a bit drab if you are looking for
maximum impact. So, that's where I was suggesting a heatmap. Or you can use
the python package yellow brick.
Here's an example using seaborn's heatmap (and making sure I label the
axis, else it is useless). I used cmap="Greens":
https://datasciencefrancois.tumblr.com/post/166291770900/confusion-matrix-with-a-single-color-sequential
I've had no problem using this with technical and non technical audiences.
Shown CMs like the above (and a variety of other graphical and
semigraphical displays) to business folks who then proceeded to green light
further phases of fairly large data science projects. Once they've seen one
and got it you never have to explain it again. Without the heatmap colors,
it was super challenging to have people "get it".
Also, you might be interested in this list of books on visualization (from
my "ex-libris" series on linkedin):
https://www.linkedin.com/pulse/ex-libris-data-scientist-part-v-visualization-francois-dion/
In particular, Stephen Few's "Show Me the Numbers : Designing Tables and
Graphs to Enlighten" should definitely be on everyone's reading list, along
with Cairo's "The Functional Art", will get you started, if you can't
commit to reading 1 viz book per week for the next 2 years :)
Thanks,
Francois
On Wed, Oct 11, 2017 at 8:52 AM, Art <artem.nesterenko at gmail.com> wrote:
> Donut graph:
> [1]https://imgur.com/a/C7r8x
> You should be able to see it now.
> Art Nestsiarenka
>
--
about.me/francois.dion - www.pyptug.org - www.3DFutureTech.info - @f_dion
<http://twitter.com/f_dion>
-------------- next part --------------
Art (and list members interested in visualization),
As Dave mentioned, donut charts work best for progress to goal. ie. a
percentage. Like a dashboard guage. Or something where the 50% mark is
important, say a win/loss indicator of the Carolina Hurricanes against
visitor. Similarly, the ancestor of donut chart, the pie chart is best
suited for parts of a whole when you have 2 or 3 elements at most. Beyond
that, it is almost impossible to figure out the percentages and relative
importance. Bar charts do much better when there are more than 2 or 3
values.
A confusion matrix, in the simplest binary case, bins 4 possible outcome
of a classifier. True positive (you are part of the class and I said so),
false positive (you are not part of the class but I said you were), true
negative (you are not part of the class and I said so) and false negative
(you are part of the class but I said you were not). The expectation of
representation of a confusion matrix, is unsurprisingly, as a matrix. The
standard way to represent this is in a table format, a matrix (of actual
against predicted), hence the name. This has been the case since at least
the 1950s (without doing an exhaustive search, just from memory). For
example, I just pulled Mike James' "Classification Algorithms" from 1985,
page 83, and there it is. He also sums each row and column.
But, sure, the plain text table is a bit drab if you are looking for
maximum impact. So, that's where I was suggesting a heatmap. Or you can
use the python package yellow brick.
Here's an example using seaborn's heatmap (and making sure I label the
axis, else it is useless). I used cmap="Greens":
[1]https://datasciencefrancois.tumblr.com/post/166291770900/confusion-matrix-with-a-single-color-sequential
I've had no problem using this with technical and non technical audiences.
Shown CMs like the above (and a variety of other graphical and
semigraphical displays) to business folks who then proceeded to green
light further phases of fairly large data science projects. Once they've
seen one and got it you never have to explain it again. Without the
heatmap colors, it was super challenging to have people "get it".
Also, you might be interested in this list of books on visualization (from
my "ex-libris" series on linkedin):
[2]https://www.linkedin.com/pulse/ex-libris-data-scientist-part-v-visualization-francois-dion/
In particular, Stephen Few's "Show Me the Numbers : Designing Tables and
Graphs to Enlighten" should definitely be on everyone's reading list,
along with Cairo's "The Functional Art", will get you started, if you
can't commit to reading 1 viz book per week for the next 2 years :)
Thanks,
Francois
On Wed, Oct 11, 2017 at 8:52 AM, Art <[3]artem.nesterenko at gmail.com>
wrote:
** **Donut graph:
** **[1][4]https://imgur.com/a/C7r8x
** **You should be able to see it now.
** **Art Nestsiarenka
--
[5]about.me/francois.dion - [6]www.pyptug.org - [7]www.3DFutureTech.info -
[8]@f_dion
References
Visible links
1. https://datasciencefrancois.tumblr.com/post/166291770900/confusion-matrix-with-a-single-color-sequential
2. https://www.linkedin.com/pulse/ex-libris-data-scientist-part-v-visualization-francois-dion/
3. mailto:artem.nesterenko at gmail.com
4. https://imgur.com/a/C7r8x
5. http://about.me/francois.dion
6. http://www.pyptug.org/
7. http://www.3dfuturetech.info/
8. http://twitter.com/f_dion
More information about the TriZPUG
mailing list