[Numpy-discussion] changed behavior of numpy.histogram

Wed Jan 23 12:47:15 EST 2008

Hi again --

You made me feel guilty about breaking your code.  Here's some
suggested substitute code :

In [10]: import  numpy

In [11]: a = numpy.array(('atcg', 'aaaa', 'atcg', 'actg', 'aaaa'))

In [12]: b = numpy.sort(a)

In [13]: c = numpy.unique(b)

In [14]: d = numpy.searchsorted(b, c)

In [15]: e = numpy.append(d[1:], len(a))

In [16]: f = e - d

In [17]:

In [17]: print c
['aaaa' 'actg' 'atcg']

In [18]: print f
[2 1 2]

Note that histogram also uses searchsorted to do its stuff.

Personally, I think the way to go is have a "countunique" function
which returns a list of unique occurrances of the array elements
(regardless of their type), and a list of their count.  The above code
could be a basis for this fcn.

I'm not sure that this  should be implemented using histogram, since
at least I ordinarily consider histogram as a numeric function.
Others may have different opinions.

Cheers,

Stuart Brorson
Interactive Supercomputing, inc.
135 Beaver Street | Waltham | MA | 02452 | USA
http://www.interactivesupercomputing.com/

On Wed, 23 Jan 2008, Mark.Miller wrote:

> Greetings:  I just noticed a changed behavior of numpy.histogram.  I
> think that a recent 'fix' to the code has changed my ability to use that
> function (albeit in an unconventional manner).  I previously used the
> histogram function to obtain counts of each unique string within a
> string array.  Again, I recognize that it is not a typical use of the
> histogram function, but it did work very nicely for me.
>
> Here's an example:
>
> ###numpy 1.0.3  --works just fine
> >>> import numpy
> >>> numpy.__version__
> '1.0.3'
> >>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
> >>> a
> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
>       dtype='|S4')
> >>> b=numpy.unique(a)
> >>> numpy.histogram(a,b)
> (array([2, 2]), array(['aaaa', 'atcg'],
>       dtype='|S4'))
> >>>
>
> ###numpy 1.0.4  --no longer functions
> >>> import numpy
> >>> numpy.__version__
> '1.0.4'
> >>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
> >>> a
> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
>       dtype='|S4')
> >>> b=numpy.unique(a)
> >>> numpy.histogram(a,b)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/opt/libraries/python/python-2.5.1/numpy-1.0.4-gnu/lib/python2.5/site-packages/numpy/lib/function_base.py",
> line 154, in histogram
>     if(any(bins[1:]-bins[:-1] < 0)):
> TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and
> 'numpy.ndarray'
> >>>
>
> Is this something that can possibly be fixed (should I submit a ticket)?
>  Or should I revert to some other approaches for implementing the same
> idea?  It really was a nice convenience.  Or, alternately, would some
> sort of new function along the lines of a numpy.countunique() ultimately
> be useful?
>
> Thanks,
>
> -Mark
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>