[Numpy-discussion] Using matplotlib's prctile on masked arrays
josef.pktd at gmail.com
josef.pktd at gmail.com
Tue Oct 27 09:25:21 EDT 2009
On Tue, Oct 27, 2009 at 7:56 AM, Gökhan Sever <gokhansever at gmail.com> wrote:
> Hello,
>
> Consider this sample two columns of data:
>
> 999999.9999 999999.9999
> 999999.9999 999999.9999
> 999999.9999 999999.9999
> 999999.9999 1693.9069
> 999999.9999 1676.1059
> 999999.9999 1621.5875
> 651.8040 1542.1373
> 691.0138 1650.4214
> 678.5558 1710.7311
> 621.5777 999999.9999
> 644.8341 999999.9999
> 696.2080 999999.9999
>
> Putting into this data into a file say "sample.data" and loading with:
>
> a,b = np.loadtxt('sample.data', dtype="float").T
>
> I[16]: a
> O[16]:
> array([ 1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
> 1.00000000e+06, 1.00000000e+06, 1.00000000e+06,
> 6.51804000e+02, 6.91013800e+02, 6.78555800e+02,
> 6.21577700e+02, 6.44834100e+02, 6.96208000e+02])
>
> I[17]: b
> O[17]:
> array([ 999999.9999, 999999.9999, 999999.9999, 1693.9069,
> 1676.1059, 1621.5875, 1542.1373, 1650.4214,
> 1710.7311, 999999.9999, 999999.9999, 999999.9999])
>
> ### interestingly, the second column is loaded as it is but a values
> reformed a little. Why this could be happening? Any idea? Anyways, back to
> masked arrays:
>
> I[24]: am = ma.masked_values(a, value=999999.9999)
>
> I[25]: am
> O[25]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
> mask = [ True True True True True True False False False
> False False False],
> fill_value = 999999.9999)
>
>
> I[30]: bm = ma.masked_values(b, value=999999.9999)
>
> I[31]: am
> O[31]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
> mask = [ True True True True True True False False False
> False False False],
> fill_value = 999999.9999)
>
>
> So far so good. A few basic checks:
>
> I[33]: am/bm
> O[33]:
> masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
> 0.39664667346 -- -- --],
> mask = [ True True True True True True False False False
> True True True],
> fill_value = 999999.9999)
>
>
> I[34]: mean(am/bm)
> O[34]: 0.41266624676580849
>
> Unfortunately, matplotlib.mlab's prctile cannot handle this division:
>
> I[54]: prctile(am/bm, p=[5,25,50,75,95])
> O[54]:
> array([ 3.96646673e-01, 6.21577700e+02, 1.00000000e+06,
> 1.00000000e+06, 1.00000000e+06])
>
>
> This also results with wrong looking box-and-whisker plots.
>
>
> Testing further with scipy.stats functions yields expected correct results:
This should not be the correct results if you use scipy.stats.scoreatpercentile,
it doesn't have correct missing value handling, it treats nans or
mask/fill values as regular numbers sorted to the end.
stats.mstats.scoreatpercentile is the corresponding function for
masked arrays.
(BTW I wasn't able to quickly copy and past your example because
MaskedArrays don't seem to have a constructive __repr__, i.e.
no commas)
I don't know anything about the matplotlib story.
Josef
>
> I[55]: stats.scoreatpercentile(am/bm, per=5)
> O[55]: 0.40877012449846228
>
> I[49]: stats.scoreatpercentile(am/bm, per=25)
> O[49]:
> masked_array(data = --,
> mask = True,
> fill_value = 1e+20)
>
> I[56]: stats.scoreatpercentile(am/bm, per=95)
> O[56]:
> masked_array(data = --,
> mask = True,
> fill_value = 1e+20)
>
>
> Any confirmation?
>
>
>
>
>
>
>
> --
> Gökhan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
More information about the NumPy-Discussion
mailing list