[Numpy-discussion] changed behavior of numpy.histogram
Stuart Brorson
sdb at cloud9.net
Wed Jan 23 12:47:15 EST 2008
Hi again --
You made me feel guilty about breaking your code. Here's some
suggested substitute code :
In [10]: import numpy
In [11]: a = numpy.array(('atcg', 'aaaa', 'atcg', 'actg', 'aaaa'))
In [12]: b = numpy.sort(a)
In [13]: c = numpy.unique(b)
In [14]: d = numpy.searchsorted(b, c)
In [15]: e = numpy.append(d[1:], len(a))
In [16]: f = e - d
In [17]:
In [17]: print c
['aaaa' 'actg' 'atcg']
In [18]: print f
[2 1 2]
Note that histogram also uses searchsorted to do its stuff.
Personally, I think the way to go is have a "countunique" function
which returns a list of unique occurrances of the array elements
(regardless of their type), and a list of their count. The above code
could be a basis for this fcn.
I'm not sure that this should be implemented using histogram, since
at least I ordinarily consider histogram as a numeric function.
Others may have different opinions.
Cheers,
Stuart Brorson
Interactive Supercomputing, inc.
135 Beaver Street | Waltham | MA | 02452 | USA
http://www.interactivesupercomputing.com/
On Wed, 23 Jan 2008, Mark.Miller wrote:
> Greetings: I just noticed a changed behavior of numpy.histogram. I
> think that a recent 'fix' to the code has changed my ability to use that
> function (albeit in an unconventional manner). I previously used the
> histogram function to obtain counts of each unique string within a
> string array. Again, I recognize that it is not a typical use of the
> histogram function, but it did work very nicely for me.
>
> Here's an example:
>
> ###numpy 1.0.3 --works just fine
> >>> import numpy
> >>> numpy.__version__
> '1.0.3'
> >>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
> >>> a
> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
> dtype='|S4')
> >>> b=numpy.unique(a)
> >>> numpy.histogram(a,b)
> (array([2, 2]), array(['aaaa', 'atcg'],
> dtype='|S4'))
> >>>
>
> ###numpy 1.0.4 --no longer functions
> >>> import numpy
> >>> numpy.__version__
> '1.0.4'
> >>> a=numpy.array(('atcg', 'atcg', 'aaaa', 'aaaa'))
> >>> a
> array(['atcg', 'atcg', 'aaaa', 'aaaa'],
> dtype='|S4')
> >>> b=numpy.unique(a)
> >>> numpy.histogram(a,b)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/opt/libraries/python/python-2.5.1/numpy-1.0.4-gnu/lib/python2.5/site-packages/numpy/lib/function_base.py",
> line 154, in histogram
> if(any(bins[1:]-bins[:-1] < 0)):
> TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and
> 'numpy.ndarray'
> >>>
>
> Is this something that can possibly be fixed (should I submit a ticket)?
> Or should I revert to some other approaches for implementing the same
> idea? It really was a nice convenience. Or, alternately, would some
> sort of new function along the lines of a numpy.countunique() ultimately
> be useful?
>
> Thanks,
>
> -Mark
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list