[Numpy-discussion] draft enum NEP

Benjamin Root ben.root at ou.edu
Thu Mar 15 20:56:36 EDT 2012


On Thursday, March 15, 2012, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Mar 14, 2012 at 1:44 AM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>> On Fri, Mar 9, 2012 at 8:55 AM, Bryan Van de Ven <bryanv at continuum.io>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I have started working on a NEP for adding an enumerated type to NumPy.
>>> It is on my GitHub:
>>>
>>>     https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum.rst
>>>
>>> It is still very rough, and incomplete in places. But I would like to
>>> get feedback sooner rather than later in order to refine it. In
>>> particular there are a few questions inline in the document that I would
>>> like input on. Any comments, suggestions, questions, concerns, etc. are
>>> very welcome.
>>
>>
>> This looks like a great start to me.
>>
>> I think the open/closed enum distinction will need to be explored a
little
>> bit more, because it interacts with dtype immutability/hashability. Do
you
>> know if there are any examples of Python objects in the wild that
>> dynamically convert from not being hashable (i.e. raising an exception if
>> used as a dict key) to become hashable?
>
> I haven't run into any...
>
> Thinking about it, I'm not sure I have any use case for this type
> being mutable. Maybe someone else can think of one? The first case
> that came to mind was in reading a large text file, where you want to
> (1) auto-create an enum, (2) use a pre-allocated array, and (3) don't
> know ahead of time what the levels are:
>
>  a = np.empty(lines_in_file, dtype=np.dtype(Enum()))
>  for i, line in enumerate(f):
>    field = line.split()[0]
>    a.dtype.add_level(field)
>    a[i] = field
>  a.dtype.seal()
>
> But really this is just can be done just as easily and efficiently
> without a mutable dtype:
>
>  a = np.empty(lines_in_file, dtype=np.int32)
>  intern_table = {}
>  next_level = 0
>  for i, line in enumerate(f):
>    field = line.split()[0]
>    val = intern_table.setdefault(field, next_level)
>    if val == next_level:
>      next_level += 1
>    a[i] = val
>  a = a.view(dtype=np.dtype(Enum(map=intern_table)))
>
> I notice that the HDF5 C library has a concept of open versus closed
> enums, but I can't tell from the documentation at hand why this is; it
> looks like it might just be a limitation of the implementation. (Like,
> a workaround for C's lack of a standard mapping type, which makes it
> inconvenient to pass in all the mappings in to a single API call.)
>
>> It might be worth adding a section which briefly compares and contrasts
the
>> proposed functionality with enums in various programming languages. Here
are
>> two links I found to try and get an idea:
>>
>> MS on C# enum usage:
>> http://msdn.microsoft.com/en-us/library/cc138362.aspx
>> Wikipedia on C++ enum class:
>> http://en.wikipedia.org/wiki/C%2B%2B11#Strongly_typed_enumerations
>>
>> For example, the C# enum has a way to enable a "flags" mode, which will
>> create successive powers of 2. This may not be a feature NumPy needs,
but if
>> people are finding it useful in C#, maybe it would be useful here too.
>
> There's also a long, ongoing debate about how to do enums in Python --
e.g.:
>  http://www.python.org/dev/peps/pep-0354/
>  http://pypi.python.org/pypi/enum/
>  http://pypi.python.org/pypi/enum_meta/
>  http://pypi.python.org/pypi/flufl.enum/
>  http://pypi.python.org/pypi/lazr.enum/
>  http://pypi.python.org/pypi/pyutilib.enum/
>  http://pypi.python.org/pypi/coding/
>
http://stackoverflow.com/questions/36932/whats-the-best-way-to-implement-an-enum-in-python
> I guess Guido likes flufl.enum:
>  http://mail.python.org/pipermail/python-ideas/2011-July/010909.html
>
> BUT, I'm not sure any of this is relevant at all. "Enums" are a
> programming language feature that are, first and foremost, about
> injecting names into your code's namespace. What I'm hoping to see is
> a dtype for holding categorical data, similar to an R "factor"
>  http://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
>  https://svn.r-project.org/R/trunk/src/library/base/R/factor.R (NB:
> This is GPL code if anyone is paranoid about contamination, but also
> the most complete API description available)
> or an HDF5 "enum"
>  http://www.hdfgroup.org/HDF5/doc/H5.user/Datatypes.html#Datatypes_Enum
> I believe pandas has some functionality along these lines too, though
> I can't find it in the online docs -- hopefully Wes will fill us in.
>
> These are basically objects that act for most purposes like string
> arrays, but in which all strings are required to come from a finite,
> specified list. This list acts like some metadata attached to the
> array; it's order may or may not be significant. And they're
> implemented internally as integer arrays.
>
> I'm not sure what it would even mean to treat this kind of data as
> "flags", since you can't take the bitwise-or of two strings...
>
> -- Nathaniel
>

I guess my problem is that this isn't _quite_ like an enum that I am
familiar with (but not quite unlike it either).  Should we call it
"factor", to avoid confusion or are there going to be too many that won't
know what that is, but would be drawn in by a name of "enum"?

Just a thought.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120315/c71bb7b3/attachment.html>


More information about the NumPy-Discussion mailing list