[Numpy-discussion] Boolean arrays with nulls?

Eric Wieser wieser.eric+numpy at gmail.com
Thu Apr 18 14:16:49 EDT 2019


One option here would be to use masked arrays:

arr = np.ma.zeros(3, dtype=bool)
arr[0] = True
arr[1] = False
arr[2] = np.ma.masked

giving

masked_array(data=[True, False, --],
             mask=[False, False,  True],
       fill_value=True)

On Thu, 18 Apr 2019 at 10:51, Stuart Reynolds <stuart at stuartreynolds.net> wrote:
>
> Thanks. I’m aware of bool arrays.
> I think the tricky part of what I’m looking for is NULLability and interoperability with code the deals with billable data (float arrays).
>
> Currently the options seem to be float arrays, or custom operations that carry (unabstracted) categorical array data representations, such as:
> 0: false
> 1: true
> 2: NULL
>
> ... which wouldn’t be compatible with algorithms that use, say, np.isnan.
> Ideally, it would be nice to have a structure that was float-like in that it’s compatible with nan-aware operations, but it’s storage is just a single byte per cell (or less).
>
> Is float8 a thing?
>
>
> On Thu, Apr 18, 2019 at 9:46 AM Stefan van der Walt <stefanv at berkeley.edu> wrote:
>>
>> Hi Stuart,
>>
>> On Thu, 18 Apr 2019 09:12:31 -0700, Stuart Reynolds wrote:
>> > Is there an efficient way to represent bool arrays with null entries?
>>
>> You can use the bool dtype:
>>
>> In [5]: x = np.array([True, False, True])
>>
>> In [6]: x
>> Out[6]: array([ True, False,  True])
>>
>> In [7]: x.dtype
>> Out[7]: dtype('bool')
>>
>> You should note that this stores one True/False value per byte, so it is
>> not optimal in terms of memory use.  There is no easy way to do
>> bit-arrays with NumPy, because we use strides to determine how to move
>> from one memory location to the next.
>>
>> See also: https://www.reddit.com/r/Python/comments/5oatp5/one_bit_data_type_in_numpy/
>>
>> > What I’m hoping for is that there’s a structure that is ‘viewed’ as
>> > nan-able float data, but backed but a more efficient structures
>> > internally.
>>
>> There are good implementations of this idea, such as:
>>
>> https://github.com/ilanschnell/bitarray
>>
>> Those structures cannot typically utilize the NumPy machinery, though.
>> With the new array function interface, you should at least be able to
>> build something that has something close to the NumPy API.
>>
>> Best regards,
>> Stéfan
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list