From clee at spiralis.merseine.nu  Mon Apr  1 09:06:59 2002
From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu)
Date: Mon Apr  1 09:06:59 2002
Subject: [Numpy-discussion] slice question and bug
Message-ID: <20020401165852.44D3E79B@spiralis.merseine.nu>

Hello,
I'm trying to track down a segv when I do the B[:] operation on an
array, "B", a that I've built in as a view on external data.  During
the process I ran into the following code (Numeric-21.0):

/* {%c++%} */
     extern int PyArray_Free(PyObject *op, char *ptr) {
	 PyArrayObject *ap = (PyArrayObject *)op;
	 int i, n;

	 if (ap->nd > 2) return -1;
	 if (ap->nd == 3) {
	     n = ap->dimensions[0];
	     for (i=0; i<n; i++) {
		 free(((char **)ptr)[i]);
	     }
	 }
	 if (ap->nd >= 2) {
	     free(ptr);
	 }
	 Py_DECREF(ap);
	 return 0;
     }
/* {%c++%} */

The multiple, incompatible tests of ap->nd are the problem.  

-chris


From clee at spiralis.merseine.nu  Mon Apr  1 10:59:02 2002
From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu)
Date: Mon Apr  1 10:59:02 2002
Subject: [Numpy-discussion] slice question and bug
In-Reply-To: <20020401165852.44D3E79B@spiralis.merseine.nu>
References: <20020401165852.44D3E79B@spiralis.merseine.nu>
Message-ID: <15528.44386.160013.936132@spiralis.merseine.nu>


clee at spiralis.merseine.nu writes:
 > 
 > Hello,
 > I'm trying to track down a segv when I do the B[:] operation on an
 > array, "B", a that I've built in as a view on external data.  During...
 > [snip]

To clarify my own somewhat non-sensical post: When I started composing
my message, I was trying to figure out a bug in my own code that
caused a crash while doing slice_array.  I've since fixed that bug.
However, in the process of figuring out what I was doing wrong I
was browsing the Numeric source code.  While examining
PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the
number of dimensions is greater than 2, yet it has code that tests for
when the number of dimensions equals 3.

So utimately, my post is just an alert, that I think there might be
some code that needs to be cleaned up. 

Thanks,
 lacking-caffeine-ly yours
 -chris 


From nwagner at mecha.uni-stuttgart.de  Wed Apr  3 11:48:47 2002
From: nwagner at mecha.uni-stuttgart.de (Nils Wagner)
Date: Wed Apr  3 11:48:47 2002
Subject: [Numpy-discussion] Factorization of complex symmetric matrices
Message-ID: <3CAABF60.12D609C0@mecha.uni-stuttgart.de>

Hi,

I am looking for a suitable factorization of complex symmetric matrices.
Where can I find a proper routine ?

Nils


From ray_drew at yahoo.co.uk  Thu Apr  4 02:27:09 2002
From: ray_drew at yahoo.co.uk (Ray Drew)
Date: Thu Apr  4 02:27:09 2002
Subject: [Numpy-discussion] RandomArray difference between Python2.1 and 2.2?
References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu>
Message-ID: <000b01c1dbc3$65fe6100$6014000a@RDREWXP>

Hi,

Can anyone explain the following?

Python 2.1.1, Numpy version='20.2.0'

Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from RandomArray import *
>>> normal(3., 1., (5,))
array([ 2.19091588,  2.44682837,  2.51790264,  4.26374364,  4.56880629])


Python 2.2, Numpy version='20.3'

Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from RandomArray import *
>>> normal(3., 1., (5,))
array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304])

Why am I getting negative values with Python 2.2? This happens consistently.
Any help would be appreciated.

Thanks,

Ray


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From pearu at cens.ioc.ee  Thu Apr  4 18:51:36 2002
From: pearu at cens.ioc.ee (Pearu Peterson)
Date: Thu Apr  4 18:51:36 2002
Subject: [Numpy-discussion] RandomArray difference between Python2.1 and
 2.2?
In-Reply-To: <000b01c1dbc3$65fe6100$6014000a@RDREWXP>
Message-ID: <Pine.LNX.4.21.0204042359090.8013-100000@cens.ioc.ee>

On Thu, 4 Apr 2002, Ray Drew wrote:

> Python 2.2, Numpy version='20.3'
> 
> Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
> Type "copyright", "credits" or "license" for more information.
> IDLE 0.8 -- press F1 for help
> >>> from RandomArray import *
> >>> normal(3., 1., (5,))
> array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304])
> 
> Why am I getting negative values with Python 2.2? This happens consistently.
> Any help would be appreciated.

This is a bug in Numpy 20.3 and should be fixed in Numpy 21.0.

Pearu


From kelson at fedka.ociw.edu  Fri Apr  5 08:23:49 2002
From: kelson at fedka.ociw.edu (Daniel D. Kelson)
Date: Fri Apr  5 08:23:49 2002
Subject: [Numpy-discussion] Error in MLab.py
Message-ID: <200204040140.g341e1e04422@fedka.ociw.edu>

Howdy:
  Shoudln't line 296 in MLab.py of Numeric 21.0, which currently reads:
    val = squeeze(dot(transpose(m)*conjugate(y)) / fact)
  read:
    val = squeeze(dot(transpose(m),conjugate(y)) / fact)
 
Thanks,
D.Kelson
Carnegie Observatories
http://www.ociw.edu/~kelson


From DavidA at ActiveState.com  Fri Apr  5 13:47:03 2002
From: DavidA at ActiveState.com (David Ascher)
Date: Fri Apr  5 13:47:03 2002
Subject: [Numpy-discussion] Re: [Python-Dev] Array Enhancements
References: <20020405203029.19286.qmail@web12903.mail.yahoo.com> <200204052121.g35LLut20125@pcp742651pcs.reston01.va.comcast.net>
Message-ID: <3CAE1913.ECB27329@activestate.com>

Guido van Rossum wrote:

> >  I would propose the following for multi-dimensional arrays:
> >
> >    a = array.array('d', 20000, 20000)
> >
> > or:
> >
> >    a = array.xarray('d', 20000, 20000)
> 
> I just realized that multi-dimensional __getitem__ shouldn't be a big
> deal.  The question is, given the above declaration, what a[0] should
> return: the same as a[0, 0] or a copy of a[0, 0:20000] or a reference
> to a[0, 0:20000].

Or a ValueError?  In the face of ambiguity, refuse the temptation to
guess.

IIRC, this issue caused lots of problems in the numpy world. cc'ing Paul
in case he wants to jump in to fill in my rusty memory.

Why does submitting a patch to arraymodule seem an easier path than
modifying numarray or numpy to support what's needed?  I believe that
the goals of numarray aren't that different from what Scott is trying to
do (memory management APIs, etc.).

I'd like to see fewer multi-dimensional array objects, not more...

--david ascher


From jochen at unc.edu  Fri Apr  5 20:56:09 2002
From: jochen at unc.edu (Jochen =?iso-8859-1?q?K=FCpper?=)
Date: Fri Apr  5 20:56:09 2002
Subject: [Numpy-discussion] numerical integration
Message-ID: <ly6635s9xy.fsf@bock.chem.unc.edu>

The following message is a courtesy copy of an article
that has been posted to comp.lang.python.announce as well.

I have made a numerical intergation package available at 
,----
| http://python.jochen-kuepper.de/integrate
`----

This is a copy of the integrate module of scipy by Travis Oliphant
plus some small changes and rearrangements to make it work standalone
(well, it need Numeric).  All credits go to the scipy folks,
esp. Travis, all errors should be blamed on me.

Greetings,
Jochen

PS: In the long run this module will be phased out in favor of scipy,
    but for now it might be useful for someone...
-- 
Einigkeit und Recht und Freiheit                http://www.Jochen-Kuepper.de
    Libert?, ?galit?, Fraternit?                GnuPG key: 44BCCD8E
        Sex, drugs and rock-n-roll


From andrewm at object-craft.com.au  Sun Apr  7 23:32:07 2002
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Sun Apr  7 23:32:07 2002
Subject: [Numpy-discussion] Puzzling numpy results?
Message-ID: <20020408063157.1659D38F5B@coffee.object-craft.com.au>

The behavior I'm seeing with zero length Numeric arrays is not what I
would have expected:

    >>> from Numeric import *
    >>> array([5]) != array([])
    zeros((0,), 'l')
    >>> array([]) == array([])
    zeros((0,), 'l')
    >>> allclose(array([5]), array([]))
    1

This is with Numeric-20.3 (and Numeric-20.2.1) - is this behavior correct,
or have I stumbled across a bug?

If both sides of the comparison are arrays with a length greater than
zero, the comparisons work as expected:

    >>> array([5]) != array([6])
    array([1])
    >>> array([5, 5]) != array([6])
    array([1, 1])
    >>> array([5]) != array([5])
    array([0])

The problem came up when I was writing unittests for some Numpy code:
under some circumstances, the code under test is expected to return a
zero length array: I was somewhat surprised when I couldn't make the
test fail! 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


From tchur at optushome.com.au  Mon Apr  8 13:34:16 2002
From: tchur at optushome.com.au (Tim Churches)
Date: Mon Apr  8 13:34:16 2002
Subject: [Numpy-discussion] Puzzling numpy results?
References: <20020408063157.1659D38F5B@coffee.object-craft.com.au>
Message-ID: <3CB208CE.2270C89@optushome.com.au>

Andrew McNamara wrote:
> 
> The behavior I'm seeing with zero length Numeric arrays is not what I
> would have expected:
> 
>     >>> from Numeric import *
>     >>> array([5]) != array([])
>     zeros((0,), 'l')
>     >>> array([]) == array([])
>     zeros((0,), 'l')
>     >>> allclose(array([5]), array([]))
>     1

The Numpy docs point out that == and != are implemented via the logical
ufuncs, and that:

 "The ``logical'' ufuncs also perform their operations on arrays in
elementwise fashion, just like the ``mathematical'' ones."

I think this explains the results you are seeing: if you do an
element-wise comparison of a length-one array with a zero-length array,
the Numpy
recycling rule means that you should always get a zero-length result.
Note that zeros((0,),'l') is not zero, it is zero zeros. So although the
results are surprising (at least to me, and you), I think the observed
results are logically correct, although surprising.

But, if that is the case, why does this hold (which I suspect reflects
what you originally expected)?:

>>> from Numeric import *
>>> array([5,6]) != array([])
1
>>> array([5,6]) == array([])
0

Tim C


From xscottg at yahoo.com  Thu Apr 11 04:32:03 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr 11 04:32:03 2002
Subject: [Numpy-discussion] Introduction
Message-ID: <20020411113152.98373.qmail@web12906.mail.yahoo.com>

Hello All.

I'm interested in this project, and am curious to what level you are
willing to accept outside contribution.  I just tried to subscribe to
the developers list, but I didn't realize that required admin approval.
 Hopefully it doesn't look like I was shaking the door without knocking
first.

Is this list active?  Is this the correct place to talk about Numarray?


A little about me:

My name is Scott Gilbert, and I work as a software developer for a
company called Rincon Research in Tucson Arizona.  We do a lot digital
signal processing/analysis among other things. 

In the last year or so, we've started to use Python in various
capacities, and we're hoping to use it for more things.

We need a good array module for various things.  Some are similar to
what it looks like Numarray is targeted at (fft, convolutions, etc...),
and others are pretty different (providing buffers for reading data
from specialized hardware etc...)

About a week ago, I noticed that Guido over in Python developer land
was willing to accept patches to the standard array module.  As such, I
thought I would take that opportunity to try and wedge some desirements
and requirements I have into that baseline.  Bummer for me, but they
weren't exactly exited about bloating out arraymodule.c to meet my
needs, and in retrospect that does make good sense.  A number of people
suggested that this might be a better place to try and get what I need.

So here I am, poking around and wondering if I can play in your
sandbox.


If you're willing to let me contribute, my specific itches that I need
to scratch are below.  Otherwise - bummer, and I hope you all catch
crabs...  :-)


-----------------------------------

It's taken me a couple of days to understand what's going on in the
source.  I've read through the design docs, and the PEP, but it wasn't
until I tried to re-implement it that it really clicked.  My
re-implementation of the array portion of what you're doing is
attached.  There are still some holes to fill in, but it's fairly
complete and supports a whole bunch of things which yours does not
(Some of which you might even find useful: Pickling, Bit type).  I'm
pretty proud of it for only 400 lines of Python (Most of which is the
bazillion type declarations).  It's probably riddled with bugs as it's
less than a day old...

After initially thinking that you guys were getting too clever, I've
come to realize it's a pretty good design overall.  Still I have some
changes I would like to make if you'll let me.  (Both to the design and
the implementation)


-------------------------

Following your design for the Array stuff, I've been able to implement
a pretty usable array class that supports the bazillion array types I
need (Bit, Complex Integer, etc...).  This gets me past my core
requirements without polluting your world, but unfortunately my new
XArray type doesn't play so well with your UFuncs.  I think my users
will definitely want to use your UFuncs when the time comes, so I want
to remedy this situation.

The first change I would like to make is to rework your code that
verifies that an object is a "usable" array.  I think NumArray should
only check for the interface required, not the actual type hierarchy. 
By this I mean that the minimum required to be a supported array type
is that it support the correct attributes, not that it actually inherit
from NDArray:

   (quoting from your paper) something like:

       _data
       _shape
       _strides
       _byteoffset
       _aligned
       _contiguous
       _type
       _byteswap

Most of these are just integer fields, or tuples of integers.  Ignoring
_type for the moment, it appears that the interface required to be a
NumArray is much less strict than actually requiring it to derive from
NumArray.  If you allow me to change a few functions (inputarray() in
numarray.py is one small example), I could use my independant XArray
class almost as is, and moreover I can implement new array objects
(possibly as extension types) for crazy things like working with page
aligned memory, memory mapping etc...


Well, that's almost enough.  The _type field poses a small problem of
sorts.  It looks like you don't require a _type to be derived from
NumericType, and this is a good thing since it allows me (and others)
to implement NumArray compatible arrays without actually requiring
NumArray to be present.

However, it would be nice if you declared a more comprehensive list of
typenames - even if they aren't all implemented in NumArray proper. 
Who knows, maybe the SciPy guys have a use for complex integers or bit
arrays.  If you make a reasonable canonical list, our data could be
passed back and forth even if NumArray doesn't know what to do with it.

See my attached module for the types of things I'm thinking of.  I'm
not so concerned about the "Native Types" that are in there, but I
think committing a list of named standard types.  (I suspect there are
others that are interested in standard C types even if the size changes
between machines...)

If you were to specify a minimal interface like this in the short term,
I could begin propagating my array module to my users.  I could get my
work done now, knowing that I'll be compatible with NumArray proper
once it matures.  I'd be willing to participate in making these changes
if necessary.

Looking at the big picture, I think it's desirable that there really
only be one official standard for ND arrays in the Python world.  That
way, the various independent groups can all share their independent
work.  You guys are the heir-apparent, so to speak, from the Python
guys point of view.

I don't know if you're trying to get all of NumArray into the Python
distribution or not, but I suspect a good interim step would be to have
a PEP that specifies what it means to be a NumArray or NDArray in
minimal terms.  Perhaps supplying an Array only module in Python that
implements this interface.  Again, I'd be willing to help with all of
this.


-------------------------

Ok, other suggestions...

Here is the list of things that your design document indicates are
required to be a NumArray:

       _data
       _shape
       _strides
       _byteoffset
       _aligned
       _contiguous
       _type
       _byteswap

I believe that one could calculate the values for _aligned and
_contiguous from the other fields.  So they shouldn't really be part of
the interface required.  I suspect it is useful for the C
implementation of UFuncs to have this information in the NDINfo struct
though, so while I would drop them from attribute interface, I would
delegate the task of calculating these values to getNDInfo() and/or
getNumInfo().

I also notice that you chose _byteswap to indicate byteswapping is
needed.  I think a better choice would be to specify the endian-ness of
the data (with an _endian attr), and have getNDInfo() and getNumInfo()
calculte the _byteswap value for the NDInfo struct.

In my implementation, I came up with a slightly different list:

            self._endian
            self._offset
            self._shape
            self._stride
            self._itemtype
            self._itemsize
            self._itemformat
            self._buffer

The only minimal differences are that _itemsize allows me to work with
arrays of bytes without having any clue what the underlying type is (in
some cases, _itemtype is "Unknown".)  Secondly, I implemented a
"Struct" _itemtype, and _itemformat is useful for for this case.  (It's
the same format string that the struct module in Python uses.)

Also, I specified 0 for _itemsize when the actual items aren't byte
addressable.  In my module, this only occurred with the Bit type.  I
figured specifying 0 like this could keep a UFunc that isn't Bit aware
from stepping on memory that it isn't allowed to.

-------------------------

Next thought:  Memory Mapping

I really like the idea of having Python objects that map huge files a
piece at time without using all of available memory.  I've seen this in
NumArray's charter as part of the reason for breaking away from
Numeric, and I'm curious how you intend to address it.

Right now, the only requirement for _data seems to be that it implement
the PyBufferProcs.  For memory mapping something else is needed...

I haven't implemented this, so take it as just my rambling thoughts:

With the addition of 3 new, optional, attributes to the NumArray object
interface, I think this could be efficiently accomplished:

     _mapproc
     _mapmin
     _mapmax

If _mapproc is present and not None, then it points to a function who's
responsibility it is to set _mapmin and _mapmax appropriately. 
_mapproc takes one argument which is the desired byte offset into the
virtual array.  This is probably easier to describe with code:

     def _mapproc(self, offset):
         unmap_the_old_range()
         mmap_a_new_range_that_includes_byteoffset()
         self._mapmin = minimum_of_new_range()
         self._mapmax = maximum_of_new_range()

In this way, when the delta between _mapmin and _mapmax is large
enough, the UFuncs could act over a large contiguous portion of the
_data array at a time before another remapping is necessary.  If the
byteoffset that a UFunc needs to work with is outside of _mapmin and
_mapmax, it must call _mapproc to remedy the situation.

This puts a lot of work into UFuncs that choose to support this.  I
suppose that is tough to avoid though.

Also, there are threading issues to think about here.  I don't know if
UFuncs are going to release the Global Interpreter Lock, but if they do
it's possible that multiple threads could have the same PyObject and
try to _mapproc different offsets at different times.

It is possible to implement a mutex for the NumArray without requiring
anything special from the PyObject that implements it...


-----------------------------


Ok.  That's probably way too much content for an Introductory email.  I
do have more thoughts on this stuff though.  They'll just have to wait
for another time.

Nice to meet you all,
    -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: XArray.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20020411/a87228d8/attachment.ksh>

From perry at stsci.edu  Thu Apr 11 09:02:05 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 11 09:02:05 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFMEIEDNAA.perry@stsci.edu>

Hi Scott,

I've printed out your message and will try to read and understand
it today. It may be a couple days before we can respond, so 
don't take a lack of an immediate response as disinterest.

Thanks, Perry


From jmiller at stsci.edu  Thu Apr 11 14:36:03 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 11 14:36:03 2002
Subject: [Numpy-discussion] slice question and bug
References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu>
Message-ID: <3CB60188.1010203@stsci.edu>

clee at spiralis.merseine.nu wrote:

>
>clee at spiralis.merseine.nu writes:
> > 
> > Hello,
> > I'm trying to track down a segv when I do the B[:] operation on an
> > array, "B", a that I've built in as a view on external data.  During...
> > [snip]
>
>To clarify my own somewhat non-sensical post: When I started composing
>my message, I was trying to figure out a bug in my own code that
>caused a crash while doing slice_array.  I've since fixed that bug.
>However, in the process of figuring out what I was doing wrong I
>was browsing the Numeric source code.  While examining
>PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the
>number of dimensions is greater than 2, yet it has code that tests for
>when the number of dimensions equals 3.
>
>So utimately, my post is just an alert, that I think there might be
>some code that needs to be cleaned up. 
>
>Thanks,
> lacking-caffeine-ly yours
> -chris 
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
Looking at the code to PyArray_Free,  I agree with Chris.  Called to 
free a 2D
array, I think that PyArray_Free leaks all of the row storage because 
ap->nd == 2, not 3:

* {%c++%} */
     extern int PyArray_Free(PyObject *op, char *ptr) {
	 PyArrayObject *ap = (PyArrayObject *)op;
	 int i, n;

	 if (ap->nd > 2) return -1;
	 if (ap->nd == 3) {
	     n = ap->dimensions[0];
	     for (i=0; i<n; i++) {
		 free(((char **)ptr)[i]);
	     }
	 }
	 if (ap->nd >= 2) {
	     free(ptr);
	 }
	 Py_DECREF(ap);
	 return 0;
     }
/* {%c++%} */


Other opinions?

Todd

-- 
Todd Miller 			jmiller at stsci.edu
STSCI / SSG			(410) 338 4576


From perry at stsci.edu  Thu Apr 11 14:57:14 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 11 14:57:14 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFGEIHDNAA.perry@stsci.edu>

> [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Scott
> Gilbert
> Subject: [Numpy-discussion] Introduction
> 
> 
> Hello All.
> 
> I'm interested in this project, and am curious to what level you are
> willing to accept outside contribution.  I just tried to subscribe to
> the developers list, but I didn't realize that required admin approval.
>  Hopefully it doesn't look like I was shaking the door without knocking
> first.
> 
> Is this list active?  Is this the correct place to talk about Numarray?
 
Sure.
 
> 
> Following your design for the Array stuff, I've been able to implement
> a pretty usable array class that supports the bazillion array types I
> need (Bit, Complex Integer, etc...).  This gets me past my core
> requirements without polluting your world, but unfortunately my new
> XArray type doesn't play so well with your UFuncs.  I think my users
> will definitely want to use your UFuncs when the time comes, so I want
> to remedy this situation.
> 
> The first change I would like to make is to rework your code that
> verifies that an object is a "usable" array.  I think NumArray should
> only check for the interface required, not the actual type hierarchy. 
> By this I mean that the minimum required to be a supported array type
> is that it support the correct attributes, not that it actually inherit
> from NDArray:
> 
>    (quoting from your paper) something like:
> 
>        _data
>        _shape
>        _strides
>        _byteoffset
>        _aligned
>        _contiguous
>        _type
>        _byteswap
> 
> Most of these are just integer fields, or tuples of integers.  Ignoring
> _type for the moment, it appears that the interface required to be a
> NumArray is much less strict than actually requiring it to derive from
> NumArray.  If you allow me to change a few functions (inputarray() in
> numarray.py is one small example), I could use my independant XArray
> class almost as is, and moreover I can implement new array objects
> (possibly as extension types) for crazy things like working with page
> aligned memory, memory mapping etc...
> 
I guess we are not sure we understand what you mean by interface.
In particular, we don't understand why sharing the same object
attributes (the private ones you list above) is a benefit to the
code you are writing if you aren't also using the low level
implementation. The above attributes are private and nothing 
external to the Class should depend on or even know about them.
Could you elaborate on what you mean by interface and the relationship
between your arrays and numarrays?

> 
> Well, that's almost enough.  The _type field poses a small problem of
> sorts.  It looks like you don't require a _type to be derived from
> NumericType, and this is a good thing since it allows me (and others)
> to implement NumArray compatible arrays without actually requiring
> NumArray to be present.
>
What do you mean by NumArray compatible?
 
[some issues snipped since we need to understand the interface issue
first]

> I don't know if you're trying to get all of NumArray into the Python
> distribution or not, but I suspect a good interim step would be to have
> a PEP that specifies what it means to be a NumArray or NDArray in
> minimal terms.  Perhaps supplying an Array only module in Python that
> implements this interface.  Again, I'd be willing to help with all of
> this.
>
We are hoping to get numarray into the distribution [it won't be the
end of the world for us if it doesn't happen]. I'll warn you that the
PEP is out of date. We are likely to update it only after we feel
we are close to having the implementation ready for consideration 
for including into the standard distribution. I would refer to the
actual implementation and the design notes for the time being.
> 
> -------------------------
> 
> Ok, other suggestions...
> 
> Here is the list of things that your design document indicates are
> required to be a NumArray:
> 
>        _data
>        _shape
>        _strides
>        _byteoffset
>        _aligned
>        _contiguous
>        _type
>        _byteswap
> 
> I believe that one could calculate the values for _aligned and
> _contiguous from the other fields.  So they shouldn't really be part of
> the interface required.  I suspect it is useful for the C
> implementation of UFuncs to have this information in the NDINfo struct
> though, so while I would drop them from attribute interface, I would
> delegate the task of calculating these values to getNDInfo() and/or
> getNumInfo().
> 
> I also notice that you chose _byteswap to indicate byteswapping is
> needed.  I think a better choice would be to specify the endian-ness of
> the data (with an _endian attr), and have getNDInfo() and getNumInfo()
> calculte the _byteswap value for the NDInfo struct.
> 
> In my implementation, I came up with a slightly different list:
> 
>             self._endian
>             self._offset
>             self._shape
>             self._stride
>             self._itemtype
>             self._itemsize
>             self._itemformat
>             self._buffer
> 
Some of the name changes are worth considering (like replacing ._byteswap
with an endian indicator, though I find _endian completely opaque as to
what it would mean--1 means what? little or big?). (BTW, we already have
_itemsize). _contiguous and _aligned are things we have been considering
changing, but I would have to think about it carefully to determine if
they really are redundant.

> The only minimal differences are that _itemsize allows me to work with
> arrays of bytes without having any clue what the underlying type is (in
> some cases, _itemtype is "Unknown".)  Secondly, I implemented a
> "Struct" _itemtype, and _itemformat is useful for for this case.  (It's
> the same format string that the struct module in Python uses.)
> 
It looks like you are trying to deal with records with these "structs". 
We deal with records (efficiently) in a completely different way. Take
a look at the recarray module.

> Also, I specified 0 for _itemsize when the actual items aren't byte
> addressable.  In my module, this only occurred with the Bit type.  I
> figured specifying 0 like this could keep a UFunc that isn't Bit aware
> from stepping on memory that it isn't allowed to.
> 
Again, we aren't sure how this works with numarray.

> -------------------------
> 
> Next thought:  Memory Mapping
> 
> I really like the idea of having Python objects that map huge files a
> piece at time without using all of available memory.  I've seen this in
> NumArray's charter as part of the reason for breaking away from
> Numeric, and I'm curious how you intend to address it.
> 
> Right now, the only requirement for _data seems to be that it implement
> the PyBufferProcs.  For memory mapping something else is needed...
> 
> I haven't implemented this, so take it as just my rambling thoughts:
> 
> With the addition of 3 new, optional, attributes to the NumArray object
> interface, I think this could be efficiently accomplished:
> 
>      _mapproc
>      _mapmin
>      _mapmax
> 
> If _mapproc is present and not None, then it points to a function who's
> responsibility it is to set _mapmin and _mapmax appropriately. 
> _mapproc takes one argument which is the desired byte offset into the
> virtual array.  This is probably easier to describe with code:
> 
>      def _mapproc(self, offset):
>          unmap_the_old_range()
>          mmap_a_new_range_that_includes_byteoffset()
>          self._mapmin = minimum_of_new_range()
>          self._mapmax = maximum_of_new_range()
> 
> In this way, when the delta between _mapmin and _mapmax is large
> enough, the UFuncs could act over a large contiguous portion of the
> _data array at a time before another remapping is necessary.  If the
> byteoffset that a UFunc needs to work with is outside of _mapmin and
> _mapmax, it must call _mapproc to remedy the situation.
> 
> This puts a lot of work into UFuncs that choose to support this.  I
> suppose that is tough to avoid though.
> 
We deal with memory mapping a completely differnent way. It's a bit late
for me to go into it in great detail, but we wrap the standard library
mmap module with a module that lets us manage memory mapped files.
This module basically memory maps an entire file and then in effect
mallocs segments of that file as buffer objects. This allocation of
subsets is needed to ensure that overlapping memory maps buffers
don't happen. One can basically reserve part of the memory mapped file
as a buffer. Once that is done, nothing else can use that part of the
file for another buffer. We do not intend to handle memory maps as a
way of sequentially mapping parts of the file to provide windowed views
as your code segment above suggests. If you want a buffer that is the
whole (large) file, you just get a mapped buffer to the whole thing.
(Why wouldn't you?)

The above scheme is needed for our purposes because many of our data files
contain multiple data arrays and we need a means of creating a numarray
object for each one. Most of this machinery has already been implemented,
but we haven't released it since our I/O package (for astronomical FITS
files) is not yet at the point of being able to use it.

> Also, there are threading issues to think about here.  I don't know if
> UFuncs are going to release the Global Interpreter Lock, but if they do
> it's possible that multiple threads could have the same PyObject and
> try to _mapproc different offsets at different times.
> 
To tell you the truth, we haven't dealt with the threading issue much. We
think about it occasionally, but have deferred dealing with it until 
we have finished other aspects first. We do want to make it thread safe
though.

Perry Greenfield


From oliphant at ee.byu.edu  Thu Apr 11 15:47:04 2002
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Thu Apr 11 15:47:04 2002
Subject: [Numpy-discussion] slice question and bug
In-Reply-To: <3CB60188.1010203@stsci.edu>
Message-ID: <Pine.LNX.4.33L2.0204111645520.32470-100000@oliphant.ee.byu.edu>

> Looking at the code to PyArray_Free,  I agree with Chris.  Called to
> free a 2D
> array, I think that PyArray_Free leaks all of the row storage because
> ap->nd == 2, not 3:
>
> * {%c++%} */
>      extern int PyArray_Free(PyObject *op, char *ptr) {
> 	 PyArrayObject *ap = (PyArrayObject *)op;
> 	 int i, n;
>
> 	 if (ap->nd > 2) return -1;
> 	 if (ap->nd == 3) {
> 	     n = ap->dimensions[0];
> 	     for (i=0; i<n; i++) {
> 		 free(((char **)ptr)[i]);
> 	     }
> 	 }
> 	 if (ap->nd >= 2) {
> 	     free(ptr);
> 	 }
> 	 Py_DECREF(ap);
> 	 return 0;
>      }
> /* {%c++%} */
>
>

This has been broken since the beginning.  I believe the documentation
says as much.  I've never used it because I always think of 2-D arrays as
a block of data not as rows of pointers.

It should be fixed, but no one's ever been interested enough to do it.

-Travis Oliphant


From xscottg at yahoo.com  Thu Apr 11 21:46:02 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr 11 21:46:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFGEIHDNAA.perry@stsci.edu>
Message-ID: <20020412044201.63373.qmail@web12908.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:
>
> I guess we are not sure we understand what you mean by interface.
> In particular, we don't understand why sharing the same object
> attributes (the private ones you list above) is a benefit to the
> code you are writing if you aren't also using the low level
> implementation. The above attributes are private and nothing 
> external to the Class should depend on or even know about them.
> Could you elaborate on what you mean by interface and the
> relationship between your arrays and numarrays?
>

There are several places in your code that check to see if you are working with
a valid type for NDArrays.  Currently this check consists of asking the
following questions:

   'Is it a tuple or list?'
   'Is it a scalar of some sort?'
   'Does it derive from our NDArray class?'

If any of these questions answer true, it does the right thing and moves on. 
If none of these is true, it raises an exception.

I suppose this is fine if you are only concerned about working with your own
implementation of an array type, but I hope you'll consider the following as a
minor change that opens up the possibility for other compatible array
implementations to work interoperably.

Instead have the code ask the following questions:

   'Is it a tuple or list?'
   'Is it a scalar of some sort?'
   'Does it support the attributes necessary to be like an NDArray object?'

This change is very similar to how you can pass in any Python object to the
"pickle.dump()" function, and if it supports the "write()" method it will be
called:

      >>> class WhoKnows:
      ...     def write(self, x):
      ...          print x
      >>>
      >>> import pickle
      >>>
      >>> w = WhoKnows()
      >>>
      >>> pickle.dump('some data', w)
      S'some data'
      p1
      .

Until reading your response above, I didn't realize that you consider your
single underscore attributes to be totally private.  In general, I try to use a
single underscore to mean protected (meaning you can use them if you REALLY
know what you are doing), hence my confusion.  With that in mind, pretend that
I suggested the following instead:

    The specification of an NDArray is that it has the following attributes

        ndarray_buffer      - a PyObject which has PyBufferProcs
        ndarray_shape       - a tuple specifying the shape of the array
        ndarray_stride      - a tuple specifyinf the index multipliers
        ndarray_itemsize    - an int/long stating the size of items
        ndarray_itemtype    - some representation of type 

This would be a very minor change to your functions like inputarray(),
getNDInfo(), getNDArray(), but it would allow your UFuncs to work with other
implementations of arrays.  As an example similar to the pickle example above:

     import array
     class ScottArray:
         def __init__(self):
             self.ndarray_buffer   = array.array('d', [0]*100)
             self.ndarray_shape    = (10, 10)
             self.ndarray_stride   = (80, 8)
             self.ndarray_itemsize = 8
             self.ndarray_itemtype = 'Float64'

     import numarray

     n = numarray.numarray((10, 10), type='Float64')
     s = ScottArray()

     very_cool = numarray.add(n, s)


This example is kind of silly.  I mean, why wouldn't I just use numarray for
all of my array needs?  Well, that's where my world is a little different than
yours I think.  Instead of using 'array.array()' above, there are times where
I'll need to use 'whizbang.array()' to get a different PyBufferProcs supporting
object.  Or where I'll need to work with a crazy type in one part of the code,
but I'd like to pass it to an extension that combines your types and mine.

In these cases where I need "special memory" or "special types" I could try and
get you guys to accept a patch, but this would just pollute your project and
probably annoy you in general.  A better solution is to create a general
standard mechanism for implementing NDArray types, and let me make my own.


In the above example, we could have completely different NDArray
implementations working interoperably inside of one UFunc.  It seems to me that
all it really takes to be an NDArray can be specified by a list of attributes
like the one above.  (Probably need a few more attributes to be really general:
'ndarray_endian', etc...)  In the end, NDArrays are just pointers to a buffer,
and descriptors for indexing.


I don't believe this would have any significant affect on the performance of
numarray.  (The efficient fast C code still gets a pointer to work with.)  More
over, I'd be very willing to contribute patches to make this happen.


If you agree, and we can flesh out what this "attribute interface" should be,
then I can start distributing my own array module to the engineers where I work
without too much fear that they'll be screwed once numarray is stable and they
want to mix and match.

Code always lives a lot longer than I want it to, and if I give them something
now which doesn't work with your end product, I'll have done them a disservice.


BTW: Allowing other types to fill in as NDArrays also allows other types to
implement things like slicing as they see fit (slice and copy contiguious,
slice and copy on write, slice and copy by reference, etc...).

>
> We are hoping to get numarray into the distribution [it won't be the
> end of the world for us if it doesn't happen]. I'll warn you that the
> PEP is out of date. We are likely to update it only after we feel
> we are close to having the implementation ready for consideration 
> for including into the standard distribution. I would refer to the
> actual implementation and the design notes for the time being.
>

Yeah, I recognize that the PEP is gathering dust at the moment.  I'm not having
too much trouble following through the source and design docs.  It took me a
few days to "get it", but that's probably because I'm slower than your average
bear.  :-)

Regarding the PEP, what I would like to see happen is that if we agree that the
"attribute interface" stuff above is the right way to go about things, I would
(or we would) submit a milder interim PEP specifying what those attributes are,
how they are to be interpreted, and a simple Python module implementing a
general NDArray class for consumption.  Hopefully this PEP would specify a
canonical list of type names as well.  Then we could make updates to the other
PEP if necessary.


>
> Some of the name changes are worth considering (like replacing ._byteswap
> with an endian indicator, though I find _endian completely opaque as to
> what it would mean--1 means what? little or big?). (BTW, we already have
> _itemsize). _contiguous and _aligned are things we have been considering
> changing, but I would have to think about it carefully to determine if
> they really are redundant.
> 

It's all open for discussion, but I would propose that ndarray_endian be one
of:

    '>' - big endian
    '<' - little endian

This is how the standard Python struct module specifies endian, and I've been
trying to stay consistant with the baseline when possible.

>
> It looks like you are trying to deal with records with these "structs". 
> We deal with records (efficiently) in a completely different way. Take
> a look at the recarray module.
> 

Will definitely do.

I've called them structs simply because they borrow their format string from
the struct module that ships with Python.  I'm not hung up on the name, and I
wouldn't object to an alias.

Too early for me to tell if there is even a difference in the underlying
memory, but maybe we'll end up with 'structs' for my notion of things, and
'records' for yours.

>
> We deal with memory mapping a completely different way. It's a bit late
> for me to go into it in great detail, but we wrap the standard library
> mmap module with a module that lets us manage memory mapped files.
> This module basically memory maps an entire file and then in effect
> mallocs segments of that file as buffer objects. This allocation of
> subsets is needed to ensure that overlapping memory maps buffers
> don't happen. One can basically reserve part of the memory mapped file
> as a buffer. Once that is done, nothing else can use that part of the
> file for another buffer. We do not intend to handle memory maps as a
> way of sequentially mapping parts of the file to provide windowed views
> as your code segment above suggests. If you want a buffer that is the
> whole (large) file, you just get a mapped buffer to the whole thing.
> (Why wouldn't you?)
> 

I think the idea of taking a 500 megabyte (or 5 gigabyte) file, and windowing 1
meg of actual memory at time pretty attractive.  Sometimes we do very large
correlations, and there just isn't enough memory to mmap the whole file (much
less two files for correlation).

Any library that doesn't want to support this business could just raise a
NotImplemented error on encountering them.

Maybe I shouldn't be calling this "memory mapping".  Even though it could be
implemented on top of mmap, truthfully I just want to support a "windowing"
interface.  If we could specify the windowing attributes and indicate the
standard usage that would be great.  Maybe:

      ndarray_window(self, offset)
      ndarray_winmin
      ndarray_winmax


>
> The above scheme is needed for our purposes because many of our data files
> contain multiple data arrays and we need a means of creating a numarray
> object for each one. Most of this machinery has already been implemented,
> but we haven't released it since our I/O package (for astronomical FITS
> files) is not yet at the point of being able to use it.
> 

There is a group at my company that is using FITS for some stuff.  I don't know
enough about it to comment though...


Cheers,
    -Scott


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Fri Apr 12 17:44:04 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Fri Apr 12 17:44:04 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020412044201.63373.qmail@web12908.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCIEFGCNAA.perry@stsci.edu>

Scott Gilbert writes:
>      import array
>      class ScottArray:
>          def __init__(self):
>              self.ndarray_buffer   = array.array('d', [0]*100)
>              self.ndarray_shape    = (10, 10)
>              self.ndarray_stride   = (80, 8)
>              self.ndarray_itemsize = 8
>              self.ndarray_itemtype = 'Float64'
>
>      import numarray
>
>      n = numarray.numarray((10, 10), type='Float64')
>      s = ScottArray()
>
>      very_cool = numarray.add(n, s)
>
But why not (I may have some details wrong, I'm doing this
from memory, and I haven't worked on it myself in a bit):

import array
import numarray
import memory # comes with numarray
class ScottArray(NumArray):
    def __init__(self):
        # create necessary buffer obj
        buf = memory.writeable_buffer(array.array('d', [0]*100))
        Numarray.__init__(self, shape=(10, 10), type=numarray.Float64
                          buffer=buf)
        # _strides not settable from constructor yet, but currently
        # if you needed to set it:
        # self._strides = (80, 8)
        # But for this case it would be computed automatically from
        # the supplied shape


n = numarray.numarray((10, 10), type='Float64')
s = ScottArray()

maybe_not_quite_so_cool_but_just_as_functional = n + s

> This example is kind of silly.  I mean, why wouldn't I just use
> numarray for
> all of my array needs?  Well, that's where my world is a little
> different than
> yours I think.  Instead of using 'array.array()' above, there are
> times where
> I'll need to use 'whizbang.array()' to get a different
> PyBufferProcs supporting
> object.  Or where I'll need to work with a crazy type in one part
> of the code,
> but I'd like to pass it to an extension that combines your types and mine.
>
> In these cases where I need "special memory" or "special types" I
> could try and
> get you guys to accept a patch, but this would just pollute your
> project and
> probably annoy you in general.  A better solution is to create a general
> standard mechanism for implementing NDArray types, and let me make my own.
>
>From everything I've seen so far, I don't see why you can't
just create a NumArray object directly. You can subclass it
(and use multiple inheritance if you need to subclass a different
object as well) and add whatever customized behavior you want.
You can create new kinds of objects as buffers just so long
as you satisfy the buffer interface.
>
> In the above example, we could have completely different NDArray
> implementations working interoperably inside of one UFunc.  It
> seems to me that
> all it really takes to be an NDArray can be specified by a list
> of attributes
> like the one above.  (Probably need a few more attributes to be
> really general:
> 'ndarray_endian', etc...)  In the end, NDArrays are just pointers
> to a buffer,
> and descriptors for indexing.
>
Again, why not just create an NDArray object with the appropriate
buffer object and attributes (subclassing if necessary).

>
> I don't believe this would have any significant affect on the
> performance of
> numarray.  (The efficient fast C code still gets a pointer to
> work with.)  More
> over, I'd be very willing to contribute patches to make this happen.
>
>
> If you agree, and we can flesh out what this "attribute
> interface" should be,
> then I can start distributing my own array module to the
> engineers where I work
> without too much fear that they'll be screwed once numarray is
> stable and they
> want to mix and match.
>
> Code always lives a lot longer than I want it to, and if I give
> them something
> now which doesn't work with your end product, I'll have done them
> a disservice.
>
All good in principle, but I haven't yet seen a reason to change
numarray. As far as I can tell, it provides all you need exactly
as it is. If you could give an example that demonstrated otherwise...
>
> It's all open for discussion, but I would propose that
> ndarray_endian be one
> of:
>
>     '>' - big endian
>     '<' - little endian
>
> This is how the standard Python struct module specifies endian,
> and I've been
> trying to stay consistant with the baseline when possible.
>
To tell you the truth, I'm not crazy about how the struct module
handles types or attributes. It's generally far too cryptic for
my tastes. Other than providing backward compatibility, we aren't
interested in it emulating struct.

> >
> > The above scheme is needed for our purposes because many of our
> data files
> > contain multiple data arrays and we need a means of creating a numarray
> > object for each one. Most of this machinery has already been
> implemented,
> > but we haven't released it since our I/O package (for astronomical FITS
> > files) is not yet at the point of being able to use it.
> >
>
>
I could well misundertand, but I thought that if you mmap a file
in unix in write mode, you do not use up the virtual memory as
limited by the physical memory and the paging file. Your only
limit becomes the virtual address space available to the processor.
If the 32 bit address is your problem, you are far, far better off
using a 64-bit processor and operating system than trying to kludge up
a windowing memory mechanism. I could see a way of doing it for
ufuncs, but the numeric world (and I would think the DSP world
as well) needs far more than element-by-element array functionality.
providing a usable C-api for that kind of memory model would be
a nightmare. But I'm not sure if this or the page file is your
limitation.

Perry


From kragen at pobox.com  Sat Apr 13 00:25:01 2002
From: kragen at pobox.com (Kragen Sitaker)
Date: Sat Apr 13 00:25:01 2002
Subject: [Numpy-discussion] segfault in Numpy esxtension
Message-ID: <20020413072433.2702DBDC1@panacea.canonical.org>

(All of the below is with regard to Numeric 20.2.0.)

For a consulting client, I wrote a extension module that does the
equivalent of sum(take(a, b)), but without the temporary result in
between.  I was surprised that when I tried to .resize() the result of
this routine, I got a segmentation fault and a core dump.

It was crashing at this line in arrayobject.c:
	if (memcmp(self->descr->zero, all_zero, elsize) == 0) {

self->descr, in this case, was the type description for arrays of type
"double".  It seems that self->descr->zero was 0, as in a null
pointer, not a pointer to a location containing (double)0, and this
was causing it to crash.

It looks like the .zero fields of the type descriptions (which live in
arraytypes.c and _numpy.so) are initialized to be null pointers, and
only when the initmultiarray() function in multiarraymodule.c is run
are these pointers set to point to actual zeroes somewhere in
allocated memory.

I guess Numeric.py imports multiarray.so, which calls
initmultiarray(), so the solution for me was to make sure I import
Numeric before importing my module (or at least before resizing arrays
produced by my module).  But, to my mind, this segfault is a bug ---
importing a module that follows all the rules shouldn't put Python in
a state that's so dangerously inconsistent that innocent things like
.resize() can crash it.  Maybe the same .so file that includes the
actual data items should be responsible for initializing them ---
especially since import_array() imports _numpy without importing
multiarray.  (I assume there's a reason it wasn't done this way in the
first place.)  What do other people think?

-- 
/* By Kragen Sitaker, http://pobox.com/~kragen/puzzle4.html */
char b[2][10000],*s,*t=b,*d,*e=b+1,**p;main(int c,char**v){int n=atoi(v[1]);
strcpy(b,v[2]);while(n--){for(s=t,d=e;*s;s++){for(p=v+3;*p;p++)if(**p==*s){
strcpy(d,*p+2);d+=strlen(d);goto x;}*d++=*s;x:}s=t;t=e;e=s;*d++=0;}puts(t);}


From xscottg at yahoo.com  Sat Apr 13 03:09:04 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sat Apr 13 03:09:04 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCIEFGCNAA.perry@stsci.edu>
Message-ID: <20020413100823.45837.qmail@web12907.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:
> Scott Gilbert writes:
[...]
> >
> >      very_cool = numarray.add(n, s)
> >
> But why not (I may have some details wrong, I'm doing this
> from memory, and I haven't worked on it myself in a bit):
> 
[...]
>
> maybe_not_quite_so_cool_but_just_as_functional = n + s
>
[...]
>
> From everything I've seen so far, I don't see why you can't
> just create a NumArray object directly. You can subclass it
> (and use multiple inheritance if you need to subclass a different
> object as well) and add whatever customized behavior you want.
> You can create new kinds of objects as buffers just so long
> as you satisfy the buffer interface.
>

Your point about the optional buffer parameter to the NumArray is well
taken.  I had seen that when looking through the code, but it slipped my
mind for that example.  I could very well be wrong about some of these
other reasons too...

I have a number of reasons listed below for wanting the standard that 
Python adopts to specify only the interface and not the implementation. 
You may not find all of these pursuasive, and I apologize in advance if any
looks like a criticism.  (In my limited years as a professional software
developer, I've found that the majority of people can be very defensive and
protective of their code.  I've been trying to tread lightly, but I don't
know if I'm succeeding.)

However if any of these reasons is persuasive, keep in mind that the actual
changes I'm proposing are pretty minimal in scope.  And that I'd be willing
to submit patches so as to reduce any inconvenience to you.  (Not that you
have any reason to believe I can code my way out of a box...  :-)

Ok, here's my list:

Philosophical

  You have a proposal in to the Python guys to make Numarray into the
  standard _implementation_.  I think standards like this should specify
  an _interface_, not an implementation.

Simplicity

  I can give my users a single XArray.py file, and they can be off and
  running with something that works right then and there, and it could in
  many ways be compatible with Numarray (with some slight modifications)
  when they decide they want the extra functionality of extension modules
  that you or anyone else who follows your standard provides.  But they
  don't have to compile anything until they really need to.

  Your implementation leaves me with all or nothing.  I'll have to build
  and use numarray, or I've got an in house only solution.

Expediency

  I want to see a usable standard arise quickly.  If you maintain the
  stance that we should all use the Numarray implementation, instead of
  just defining a good Numarray interface, everyone has to wait for you
  to finish things enough to get them accepted by the Python group.  Your
  implementation is complicated, and I suspect they will have many things
  that they will want you to change before they accept it into their
  baseline.  (If you think my list of suggestions is annoying, wait until
  you see theirs!)

  If a simple interface protocol is presented, and a simple pure Python
  module that implements it.  The PEP acceptance process might move along
  quickly, but you could take your time with implementing your code.

Pragmatic

  You guys aren't finished yet, and I need to give my users an array
  module ASAP.  As such a new project, there are likely to be many bugs
  floating around in there.  I think that when you are done, you will
  probably have a very good library.  Moreover, I'm grateful that you are
  making it open source.  That's very generous of you, and the fact that
  you are tolerating this discussion is definitely appreciated.

  Still, I can't put off my projects, and I can't task you to work faster. 


  However, I do think we could agree in a very short term that your design
  for the interface is a good one.  I also think that we (or just me if you
  like) could make a much smaller PEP that would be more readily accepted.
  Then everyone in this community could proceed at their own pace - knowing
  that if we followed the simple standard we would have inter operability
  with each other.

Social

  Normally I wouldn't expect you to care about any of my special issues.
  You have your own problems to solve.  As I said above, it's generous of
  you to even offer your source code.
  
  However, you are (or at least were) trying to push for this to become a
  standard.  As such, considering how to be more general and apply to a 
  wider class of problems should be on your agenda.  If it's not, then you
  shouldn't be creating the standard.

  If you don't care about numarray becoming standard, I would like to try
  my hand at submitting the slightly modified version of your design.  I
  won't be compatible with your stuff, but hopefully others will follow
  suit.

Functionality

  Data Types

    I have needs for other types of data that you probably have little use
    for.  If I can't coerce you to make a minor change in specification, I
    really don't think I could coerce you to support brand new data types
    (complex ints is the one I've beaten to death, because I could use that

    one in the short term).  What happens when someone at my company wants
    quaternions?  I suspect that you won't have direct support for those.
    I know that numarray is supposed to be extensible, but the following
    raises an exception:

        from numarray import *

        class QuaternionType(NumericType):
            def __init__(self):
                NumericType.__init__(self, "Quaternion", 4*8, 0)

        Quaternion = QuaternionType()  # BOOM!

        q = array(shape=(10, 10), type=Quaternion)

    Maybe I'm just doing something wrong, but it looks like your code
    wants "Quaternion" to be in your (private?) typeConverters dictionary.

    Ok, try two:

        from numarray import *

        q = NDArray(shape=(10, 10), itemsize=4*8)

        if a[5][5] is None:
            print "No boom, but what can I do with it?"

    Maybe this is just a documentation problem.  On the other hand, I can
    do the following pretty readily:

        import array
        class Quat2D:
            def __init__(self, *shape):
                assert len(shape) == 2
                self._buffer = array.array('d', [0])*shape[0]*shape[1]*4
                self._shape, self._stride = tuple(shape), (4*shape[0], 4)
                self._itemsize = 4*8

            def __getitem__(self, sub):
                assert isinstance(sub, tuple) and len(sub) == 2
                offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
                return tuple([self._buffer[offset + i] for i in range(4)])

            def __setitem__(self, sub, val):
                assert isinstance(sub, tuple) and len(sub) == 2
                offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
                for i in range(4): self._buffer[offset + i] = val[i]
                return val

        q = Quat2D(10, 10)
        q[5, 5] = (1, 2, 3, 4)
        print q[5, 5]

    This isn't very general, but it is short, and it makes a good example.

    If they get half of their data from calculations using Numarray, and
    half from whatever I provide them, and then try to mix the results in
    an extension module that has to know about separate implementations,
    life is more complicated than it should be.

  Operations

    I'm going to have to write my own C extension modules for some high
    performance operations.  All I need to get this done is a void*
pointer,
    the shape, stride, itemsize, itemtype, and maybe some other things to
    get off and running.  You have a growing framework, and you have
already
    indicated that you think of your hidden variables as private.  I don't
    think I or my users should have to understand the whole UFunc framework
    and API just to create an extension that manipulates a pointer to an
    array of doubles.

    Arrays are simpler than UFuncs.  I consider them to be pretty seperable
    parts of your design.  If you keep it this way, and it becomes the
    standard, it seems that I and everyone else will have to understand
    both parts in order to create an extension module.

Flexibility

  Numarray is going to make a choice of how to implement slicing.  My guess
  is that it will be one of "copy contiguous", "copy on write", "copy by 
  reference".  I don't know what the correct choice is, but I know that
  someone else will need something different based on context.  Things like
  UFuncs and other extension modules that do fast C level calculations
  typically don't need to concern themselves with slicing behaviour.

Design

  Your implementation would be similar to having the 'pickle' module
  require you to derive from a 'Pickleable' base class - instead of simply
  providing __getstate__ and __setstate__ methods.

  It's an artificial constraint, and those are usually bad.

>
> All good in principle, but I haven't yet seen a reason to change
> numarray. As far as I can tell, it provides all you need exactly
> as it is. If you could give an example that demonstrated otherwise...
>

Maybe you're right.  I suspect you as the author will come up with the
quick example that shows how to implement my bizarre quaternion example
above.  I'm not sure if this makes either of us right or wrong, but if
you're not buying any of this, then it's probably time for me to chock
this off to a difference in opinion and move on.

Truthfully this is taking me pretty far from my original tack.  Originally
I had simply hoped to hack a couple of things into arraymodule.c, and here
I am now trying to get a simpler standard in place.  I'll try one last time
to convince you with the following two statements:

  - Changing such that you only require the interface is a subtle,
    but noticeable, improvement to your otherwise very good design.

  - It's not a difficult change.


If that doesn't compel you, at least I can walk away knowing I tried.  For
the volumes I've written, this will probably be my last pesky message if
you really don't want to budge on this issue.


>
> To tell you the truth, I'm not crazy about how the struct module
> handles types or attributes. It's generally far too cryptic for
> my tastes. Other than providing backward compatibility, we aren't
> interested in it emulating struct.
>

I consider it a lot like regular expressions.  I cringe when I see someone
else's, but I don't have much difficulty putting them together.

The alternative of coming up with a different specifier for records/structs
is probably a mistake now that the struct module already has it's (terse)
format specification.  Once that is taken into consideration, following all
the leads of the struct module makes sense to me.

 
>
> I could well misunderstand, but I thought that if you mmap a file
> in unix in write mode, you do not use up the virtual memory as
> limited by the physical memory and the paging file. Your only
> limit becomes the virtual address space available to the processor.
>

Regarding efficiency, it depends on the implementations, which vary
greatly, and there are other subtleties.  I've already written a book
above, so I won't tire you with details.  I will say that closing a large
memory mapped file on top of NFS can be dreadful.  It probably takes the
same amount of total time or less, but from an interactive analysys point
of view it's pretty unpleasant on Tru64 at least.

Also, just mmaping the whole file puts all of the memory use at the
discretion of the OS.  I might have a gig or two to work with, but if mmap
takes them all, other threads will have to contend for memory.  The system
(application) as a whole might very well run better if I can retain some
control over this.


I'm not married to the windowing suggestion.  I think it's something to
consider, but it might not be a common enough case to try and make a
standard mechanism for.  If there isn't a way to do it without a kluge,
then I'll drop it.  Likewise if a simple strategy can't meet anyone's real
needs.


>
> If the 32 bit address is your problem, you are far, far better off
> using a 64-bit processor and operating system than trying to kludge up
> a windowing memory mechanism.
>

We don't always get to specify what platform we want to run on.  Our
customer has other needs, and sometimes hardware support for exotic devices
dictate what we'll be using.  Frequently it is on 64 bit Alphas, but
sometimes the requirement is x86 Linux, or 32 bit Solaris.

Finally, our most frustrating piece of legacy software was written in
Fortran assuming you could stuff a pointer into an INT*4 and now requires
the -taso flag to the compiler for all new code (which turns a sexy 64 bit
Alpha into a 32 bit kluge...).

Also, much of our data comes on tapes.  It's not easy to memory map those.

>
> I could see a way of doing it for
> ufuncs, but the numeric world (and I would think the DSP world
> as well) needs far more than element-by-element array functionality.
> providing a usable C-api for that kind of memory model would be
> a nightmare. But I'm not sure if this or the page file is your
> limitation.
>

I would suggest that any extension module which is not interested in this
feature simply raise a NotImplemented exception of some sort.  UFuncs could
fall into this camp without any criticism from me.  All it would have to do
is check if the 'window_get' attribute is a callable, and punt an
exception. 

My proposal wasn't necessarily to map in a single element at a time.  If
the C extension was willing to work these beasts at all, it would check to
see if the offset it wanted was between window_min and window_max.  If it
wasn't, then it would call ob.window_get(offset), and the Python object
could update window_min and window_max however it sees fit.  For instance
by remapping 10 or 20 megabytes on both sides.

This particular implementation would allow us to do correlations of a small
(mega sample) chunk of data against a HUGE (giga sample) file.

This might be the wrong interface, and I'm willing to listen to a better
suggestion.

It might also be too special of a need to detract from a simpler overall
design.

Also, there are other uses for things like this.  It could possibly be used
to implement sparse arrays.  It's probably not the best implementation of
that, but it could hide a dict of set data points, and present it to an
extension module as a complete array.


Cheers,
    -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Sat Apr 13 18:43:02 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Sat Apr 13 18:43:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020413100823.45837.qmail@web12907.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCEEFHCNAA.perry@stsci.edu>

> Ok, here's my list:
>
> Philosophical
>
>   You have a proposal in to the Python guys to make Numarray into the
>   standard _implementation_.  I think standards like this should specify
>   an _interface_, not an implementation.
>
Sure (though there is often more to a standard than just an interface,
but certainly an implementation is generally not the standard). I'm
not sure why you think we imply the implementation is the standard.
We are waiting to rewrite the PEP when we are closer to having
the implementation ready, but we've been very open about the design
and have asked for input on it for a long time now.

> Simplicity
>
>   I can give my users a single XArray.py file, and they can be off and
>   running with something that works right then and there, and it could in
>   many ways be compatible with Numarray (with some slight modifications)
>   when they decide they want the extra functionality of extension modules
>   that you or anyone else who follows your standard provides.  But they
>   don't have to compile anything until they really need to.
>
>   Your implementation leaves me with all or nothing.  I'll have to build
>   and use numarray, or I've got an in house only solution.
>
Hard to comment on this.

> Expediency
>
>   I want to see a usable standard arise quickly.  If you maintain the
>   stance that we should all use the Numarray implementation, instead of
>   just defining a good Numarray interface, everyone has to wait for you
>   to finish things enough to get them accepted by the Python group.  Your
>   implementation is complicated, and I suspect they will have many things
>   that they will want you to change before they accept it into their
>   baseline.  (If you think my list of suggestions is annoying, wait until
>   you see theirs!)
>
I have the strong sense you misunderstand how the process works.
Guido will be driven in large part by the acceptance or non-acceptance
of the Numeric community. If they don't buy into it. It won't be
part of the standard. If it won't be used by many, it won't be part
of the standard. Yes, he will review the design and interface to see
if there should be a long term commitment by the Python maintainers
to have it in the standard library. We have sent him the design
documents, and we do keep him informed. He  has given us feedback
about it. But for the most part, the judgement is going to be by
the Numeric community.

>   If a simple interface protocol is presented, and a simple pure Python
>   module that implements it.  The PEP acceptance process might move along
>   quickly, but you could take your time with implementing your code.
>
> Pragmatic
>
>   You guys aren't finished yet, and I need to give my users an array
>   module ASAP.  As such a new project, there are likely to be many bugs
>   floating around in there.  I think that when you are done, you will
>   probably have a very good library.  Moreover, I'm grateful that you are
>   making it open source.  That's very generous of you, and the fact that
>   you are tolerating this discussion is definitely appreciated.
>
>   Still, I can't put off my projects, and I can't task you to
> work faster.
>
>
>   However, I do think we could agree in a very short term that your design
>   for the interface is a good one.  I also think that we (or just
> me if you
>   like) could make a much smaller PEP that would be more readily accepted.
>   Then everyone in this community could proceed at their own pace
> - knowing
>   that if we followed the simple standard we would have inter operability
>   with each other.
>
I think we still don't understand what you need yet. More elaboration
on that later.

> Social
>
>   Normally I wouldn't expect you to care about any of my special issues.
>   You have your own problems to solve.  As I said above, it's generous of
>   you to even offer your source code.
>
>   However, you are (or at least were) trying to push for this to become a
>   standard.  As such, considering how to be more general and apply to a
>   wider class of problems should be on your agenda.  If it's not, then you
>   shouldn't be creating the standard.
>
Pleeease. Just because a library developer doesn't happen to meet your
needs doesn't mean it can't be part of the standard library. There
are plenty of modules in the standard library that could have been
made more general in some way, but there they are. The criteria is
whether it solves problems for a large community of users, not that
it is infinitely extensible or so on. Software development is full of
trade-offs and that includes limits to generalization. Sure we
can discuss whether things could be made more general or not. But
because you want it more general doesn't mean we just say "Sure, you
define everything!"

>   If you don't care about numarray becoming standard, I would like to try
>   my hand at submitting the slightly modified version of your design.  I
>   won't be compatible with your stuff, but hopefully others will follow
>   suit.
>
You are free to propose your own standard at any time. No one will
stop you from doing so.

> Functionality
>
>   Data Types
>
>     I have needs for other types of data that you probably have little use
>     for.  If I can't coerce you to make a minor change in specification, I
>     really don't think I could coerce you to support brand new data types
>     (complex ints is the one I've beaten to death, because I
> could use that
>
You are right on complex ints (that we won't consider them). One
could take numarray and add them if one wanted and have a more
extended version. But we won't do it, and we wouldn't support as
being in what we maintain. It's one of those trade offs.

>     one in the short term).  What happens when someone at my company wants
>     quaternions?  I suspect that you won't have direct support for those.
>     I know that numarray is supposed to be extensible, but the following
>     raises an exception:
>
>         from numarray import *
>
>         class QuaternionType(NumericType):
>             def __init__(self):
>                 NumericType.__init__(self, "Quaternion", 4*8, 0)
>
>         Quaternion = QuaternionType()  # BOOM!
>
>         q = array(shape=(10, 10), type=Quaternion)
>
>     Maybe I'm just doing something wrong, but it looks like your code
>     wants "Quaternion" to be in your (private?) typeConverters dictionary.
>
Yep, and there's a good reason for that. Just spend a few minutes
thinking about the role types play with array packages and how they
have traditionally been implemented. Generally speaking, it is
presumed that any two numeric types may be used in a binary operator.
So you, Scott, define your special type, Quaternions. You will need
to provide the module all the machinery for knowing what to do with
all the other numeric types available. You may not care, but it is
a requirement that numarray (and Numeric) know what to do. If that
doesn't fit in with your needs, then you shouldn't be trying to use
it. The problem is worse than that. You supply a Quaternion type extension
to numarray, and Bob supplies a super long int type (64 bytes!) also.
Both of you have gone to the trouble of giving numarray the means of
handling all other default numarray types. But you don't know to
handle each other. How do you solve that problem? I don't know.
If you do, let us know. Given the requirements, adding new numeric
types is not going to allow indepenent extensions to work with each
other. That's fairly limiting, but that's the price that is paid
for the feature.

>     Ok, try two:
>
>         from numarray import *
>
>         q = NDArray(shape=(10, 10), itemsize=4*8)
>
>         if a[5][5] is None:
>             print "No boom, but what can I do with it?"
>
>     Maybe this is just a documentation problem.  On the other hand, I can
>     do the following pretty readily:
>
>         import array
>         class Quat2D:
>             def __init__(self, *shape):
>                 assert len(shape) == 2
>                 self._buffer = array.array('d', [0])*shape[0]*shape[1]*4
>                 self._shape, self._stride = tuple(shape), (4*shape[0], 4)
>                 self._itemsize = 4*8
>
>             def __getitem__(self, sub):
>                 assert isinstance(sub, tuple) and len(sub) == 2
>                 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
>                 return tuple([self._buffer[offset + i] for i in range(4)])
>
>             def __setitem__(self, sub, val):
>                 assert isinstance(sub, tuple) and len(sub) == 2
>                 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
>                 for i in range(4): self._buffer[offset + i] = val[i]
>                 return val
>
>         q = Quat2D(10, 10)
>         q[5, 5] = (1, 2, 3, 4)
>         print q[5, 5]
>
>     This isn't very general, but it is short, and it makes a good example.
>
I'm not sure what it proves. If all you need is an array to store
some kind of type, be able to index and slice it, and not provide
numeric operations, by all means use the existing array module, it
does that fine. It's more work to subclass NDArray, but it can do
it too, and gives you more capabilities (you won't be able to use
index arrays or broadcasting in the array module for example). The
extra functionality comes at some price. Sure, it isn't as simple to
extend. It's your choice if it is worth it or not. If you want
to add your large quaterion array efficiently, then the array
module is worthless. Your example shows nothing about what your
real needs for the object are.

>     If they get half of their data from calculations using Numarray, and
>     half from whatever I provide them, and then try to mix the results in
>     an extension module that has to know about separate implementations,
>     life is more complicated than it should be.
>
It's how you intend to 'mix' these that I have no clue about.

>   Operations
>
>     I'm going to have to write my own C extension modules for some high
>     performance operations.  All I need to get this done is a void*
> pointer,
>     the shape, stride, itemsize, itemtype, and maybe some other things to
>     get off and running.  You have a growing framework, and you have
> already
>     indicated that you think of your hidden variables as private.  I don't
>     think I or my users should have to understand the whole UFunc
> framework
>     and API just to create an extension that manipulates a pointer to an
>     array of doubles.
>
Sigh. No one said you had to understand the ufunc framework to do so.
We are working on an C API that just gives you a simple pointer (it's
actually available now, but we aren't going to tout it until we have
better documentation).

>     Arrays are simpler than UFuncs.  I consider them to be pretty
> seperable
>     parts of your design.  If you keep it this way, and it becomes the
>     standard, it seems that I and everyone else will have to understand
>     both parts in order to create an extension module.
>
Wrong.

> Flexibility
>
>   Numarray is going to make a choice of how to implement slicing.
>  My guess
>   is that it will be one of "copy contiguous", "copy on write", "copy by
>   reference".  I don't know what the correct choice is, but I know that
>   someone else will need something different based on context.
> Things like
>   UFuncs and other extension modules that do fast C level calculations
>   typically don't need to concern themselves with slicing behaviour.
>
And they don't.

> Design
>
>   Your implementation would be similar to having the 'pickle' module
>   require you to derive from a 'Pickleable' base class - instead of simply
>   providing __getstate__ and __setstate__ methods.
>
>   It's an artificial constraint, and those are usually bad.
>
You say. You are quite welcome do your own implementation that
doesn't have this 'artificial' constraint. After all your text
I *still* don't understand how you intend to use the 'interface'
of the private attributes. You haven't provided any example (let
alone a compelling one) of why we should accept any object that
provides those attributes. Shoudn't the object also provide all
the public methods. Shouldn't also provide indexing and so forth.
All in all you are talking about checking quite a few attributes
to make sure the object has the interface. And even if it does,
*why* in the world would we presume that the C functions used by
numarray would work properly with the object you provide. I
really don't have a clue as to what you are getting at here, and
without some real concrete example illustrating this point, I
don't think there is any point to continuing this discussion.
> >
> > All good in principle, but I haven't yet seen a reason to change
> > numarray. As far as I can tell, it provides all you need exactly
> > as it is. If you could give an example that demonstrated otherwise...
> >
>
> Maybe you're right.  I suspect you as the author will come up with the
> quick example that shows how to implement my bizarre quaternion example
> above.  I'm not sure if this makes either of us right or wrong, but if
> you're not buying any of this, then it's probably time for me to chock
> this off to a difference in opinion and move on.
>
> Truthfully this is taking me pretty far from my original tack.  Originally
> I had simply hoped to hack a couple of things into arraymodule.c, and here
> I am now trying to get a simpler standard in place.  I'll try one
> last time
> to convince you with the following two statements:
>
>   - Changing such that you only require the interface is a subtle,
>     but noticeable, improvement to your otherwise very good design.
>
>   - It's not a difficult change.
>
>
> If that doesn't compel you, at least I can walk away knowing I tried.  For
> the volumes I've written, this will probably be my last pesky message if
> you really don't want to budge on this issue.
>
We're not going to budge until you show us what the hell you are talking
about.
>
> The alternative of coming up with a different specifier for
> records/structs
> is probably a mistake now that the struct module already has it's (terse)
> format specification.  Once that is taken into consideration,
> following all
> the leads of the struct module makes sense to me.
>
Again, you are free to do your own, or fork our numarray and
do it the way you want. Or do your own from scratch. Or whatever.
>
[...]
> Also, just mmaping the whole file puts all of the memory use at the
> discretion of the OS.  I might have a gig or two to work with, but if mmap
> takes them all, other threads will have to contend for memory.  The system
> (application) as a whole might very well run better if I can retain some
> control over this.
>
>
> I'm not married to the windowing suggestion.  I think it's something to
> consider, but it might not be a common enough case to try and make a
> standard mechanism for.  If there isn't a way to do it without a kluge,
> then I'll drop it.  Likewise if a simple strategy can't meet anyone's real
> needs.
>
You can forget our doing it. It's out of the question for us.
> >
> > If the 32 bit address is your problem, you are far, far better off
> > using a 64-bit processor and operating system than trying to kludge up
> > a windowing memory mechanism.
> >
>
> We don't always get to specify what platform we want to run on.  Our
> customer has other needs, and sometimes hardware support for
> exotic devices
> dictate what we'll be using.  Frequently it is on 64 bit Alphas, but
> sometimes the requirement is x86 Linux, or 32 bit Solaris.
>
> Finally, our most frustrating piece of legacy software was written in
> Fortran assuming you could stuff a pointer into an INT*4 and now requires
> the -taso flag to the compiler for all new code (which turns a sexy 64 bit
> Alpha into a 32 bit kluge...).
>
You may have customers with unreasonable demands. We don't have to
let them cause an incredible complication in the underlying machinery.
(And we won't). And we won't make it work on Windows 3.1 either.
We have to draw the line somewhere. Your customers will pay dearly
(and you will benefit :-).

> Also, much of our data comes on tapes.  It's not easy to memory map those.
>
Your point being?
> >
>
[...]

This doesn't seem to be going anywhere. If you can give us
a better idea of how your interface needs would be used,
at least we could respond to the specific issues. But we
don't understand and although we are considering some
changes, I'm not going to fold in your requests until
we do understand.

You may not be happy with the progress we are making either.
Sorry, I can't help that. If you need something sooner,
you'll need to do something else. Come up with your
own system and try to get it into Python. Take numarray
and do it the way you think it ought to be done and at
the rate you think it should be done. You're welcome to.
Take the array module and use that as a basis.

We'd like numarray to be part of the standard. We'd like
it to be the standard package in the Numeric community.
But if neither happened, we'd still be working on it.
We need it for our own work. Numeric doesn't give us
the capabilities that we need. We are using it for
our software development and it is being used to reduce
HST data now. We are continuing on this regardless.

Perry


From paul at pfdubois.com  Sat Apr 13 19:35:02 2002
From: paul at pfdubois.com (Paul F Dubois)
Date: Sat Apr 13 19:35:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCEEFHCNAA.perry@stsci.edu>
Message-ID: <000001c1e35c$d85a2f90$0a01a8c0@NICKLEBY>

I haven't been following this discussion (I have a product release on
Monday). But I am getting a lot of mail stacking up for numpy-developers
which will not go through unless you are one of the registered
developers mailing from your registered mail account. 

All others, please do not use numpy-developers. This is a private
channel for the official  developers only.

I gather from my brief reading that someone is looking for a standard to
use now. That standard is Numeric. If you go with that now then when the
time comes to switch to Numarray, you'll be in the same boat as the
whole community and therefore liable to be able to profit from any
conversion tools required. You can reduce your problems to a minimum by
sticking with the Python interface where possible.

If you have some special need that Numeric is not meeting please realize
that what exists is a consensus product after a long evolution and it is
not likely to change much to meet  your particular needs. There are some
areas where what is right for one set of people is wrong for the others.


From xscottg at yahoo.com  Sun Apr 14 04:20:03 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sun Apr 14 04:20:03 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCEEFHCNAA.perry@stsci.edu>
Message-ID: <20020414111911.2977.qmail@web12901.mail.yahoo.com>

Perry, I've been trying to be persuasive, but I think all I've 
managed to do is to be verbose and annoy you.  Please accept 
my apologies.

I really am sorry this is going as poorly as it is.  I'm doing a lousy
job of getting my point across, and I'd like to turn around the tone
this has taken.  Email always comes off as more antagonistic
than intended.

Finally, my appeal to the fact that you are proposing a standard
was heavy handed.  I guess I was trying to use that to force
you to consider my position.  It clearly backfired...

I'll try to be more to the point.


Here's what I'm proposing, and it's only a suggestion.


*** I think the requirements for being a general purpose "NDArray" 
can be specified with only the following attributes:

    __array_buffer__    - as buffer object
    __array_shape__     - as tuple of long
    __array_itemsize__  - as int

    Optionally
    __array_stride__    - as tuple of long (get from shape if None)
    __array_offset__    - as int (would default to 0 if not present)

Then anyone who implemented these could work with the same C API for
getting the pointer to memory, shape array, stride array, and item size.  

The set of operations on a pure "NDArray" is probably pretty minimal
(reshape, transpose/rotate, index arrays?).

So in order to create a full featured "NumArray", a few more attributes
are required:

    __array_itemtype__  - as string?

    Optionally
    __array_endian__    - as 1 char string?  (default to the native endian)

This brings the total up to 4 required attributes, and 3 optional ones 
for a very general purpose array data structure.  (I can think of other 
optional ones, but skip that for now.)


>
> All in all you are talking about checking quite a few attributes
> to make sure the object has the interface. And even if it does,
> *why* in the world would we presume that the C functions used by
> numarray would work properly with the object you provide.
>

Because truthfully arrays are little more than a pointer to memory.

That's like asking "why in the world would we presume memcpy() or 
qsort() would know what to do with your memory?"


>
> You haven't provided any example (let
> alone a compelling one) of why we should accept any object that
> provides those attributes.
>

Well, the UFuncs certainly should reject any object that they don't
know how to handle.  I'm currently only addressing what it takes to be
an NDArray/NumArray object.  OTOH, if I can present something to the
UFuncs that looks like a known array type, why wouldn't UFuncs
want to work with it?


Ok, so what does this buy you?  

Well, it probably doesn't buy you personally very much.  Your needs are
already being met by the current implementation.


Ok, so what does this cost you?

A few translations:

    _data       -> __array_buffer__
    _shape      -> __array_shape__
    _strides    -> __array_stride__
    _itemsize   -> __array_itemsize__
    _offset     -> __array_offset__
    _type       -> __array_type__
    _byteswap   -> __array_endian__

This isn't a style criticism.  I'm not just asking you to change your
names,
I'm asking to promote the names to be a "standard interface" much like
these things are in many places in Python.

Also requires some small changes to getNDInfo() and getNumInfo()
so that they can calculate the derived fields (contiguous, aligned,
etc...).

Also requires some changes to your scripts so that it checks for
the interface rather than the inheritance.


What are the benefits to anyone else?

- Describes how anyone could implement something that looks and acts
like NDArrays or NumArrays.  There are probably a lot of reasons to
want to do this.  I have some reasons that I don't think you value
too much.  I think others would have reasons which I can't imagine too.

- Allows one standard API for getting at the basics of NDArrays/NumArrays

- Allows anyone to easily implement other data types for NumArrays.
The typecode won't match any of your builtin types, but maybe other
third parties could agree on other typecodes for their crazy needs and
share modules.

- Allows me personally to distribute a separate (and simpler)
implementation of NDArrays/NumArrays right now and have the same data
objects work with yours when you're all done.  If I give the UFuncs a
pointer to memory, and the attributes above, why shouldn't it work
correctly?


>
> We're not going to budge until you show us what the hell you are talking
> about.
>

Am I doing any better?  I am trying.


>
> You are right on complex ints (that we won't consider them). One
> could take numarray and add them if one wanted and have a more
> extended version. But we won't do it, and we wouldn't support as
> being in what we maintain. It's one of those trade offs.
>

Is there a way, today, without modifying numarray, for me to use
numarray as a holder for these esoteric data types?  Is that way difficult?
 Could it be easier?

I'm not asking numarray to know about my types in it's core baseline.  I'm
wondering what it takes to implement new types at all.


>
> Your example shows nothing about what your
> real needs for the object are.
>

My real needs are all over the place.  Some of which you've shown me
are solvable with the current implementation of numarray.  Some of
which you've not addressed or said you won't address.


To be explicit:

Here are (at least most of) my _needs_ for array objects:

      - support a wide variety of data types (user defined)
      - have efficient storage
      - support the pickle interface for serialization
      - allow alternate sources of underlying memory
      - have an easy interface for accessing the pieces
        necessary to create C extensions (buffer, shape, stride, ...)
      - completed and reliable in the near term

Here are (at least some of) my _wants_ for array objects:

      - cooperate on some level with other standard array
        modules (once the standard is set)
      - have same API for accessing the pieces (buffer, shape,
        stride, ...) as all standard array modules will.
      - implementation in pure Python so that building extension
        modules is not required until the fast operations present
        in those modules is required.
      - implemented from a standard that is as good as it can be

Here are (at least some of) my _whims_ for array objects:

      - has "windowing" functionality to work efficiently with
        really large files (on any modern platform).
      - alternate implementations for things such as "slicing
        behaviour" (copy on write, reference).


Loosely following your design, I've already written a module that meets 
my "needs", I was hoping that we could cooperate towards filling in some
of my "wants" (cooperating array modules), and I've brought up my "whims"
because I thought they were interesting possibilities for discussion.


I was going to respond to some of your other remarks, but I've probably
wasted enough of your time.  If you don't respond to this message, I'll
take that as a sign that we just aren't going to see eye to eye on any of 
this, and I won't bother you any more. 

(I'll be half surprised if you even get this message.  From the tone
of your last one, I wouldn't be shocked to find out you've already
added me to your killfile. :-)


No hard feelings,
      -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Sun Apr 14 11:55:02 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Sun Apr 14 11:55:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020414111911.2977.qmail@web12901.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCIEFICNAA.perry@stsci.edu>

Hi Scott,

Just to be to the point, I'm still missing what I've been
asking for, to wit a concrete example that illustrates your
point. I'll try to address a few of your points that appear
to try to answer that and clarify what I mean by concrete
example.
>
> Here's what I'm proposing, and it's only a suggestion.
>
>
> *** I think the requirements for being a general purpose "NDArray"
> can be specified with only the following attributes:
>
>     __array_buffer__    - as buffer object
>     __array_shape__     - as tuple of long
>     __array_itemsize__  - as int
>
>     Optionally
>     __array_stride__    - as tuple of long (get from shape if None)
>     __array_offset__    - as int (would default to 0 if not present)
>
> Then anyone who implemented these could work with the same C API for
> getting the pointer to memory, shape array, stride array, and item size.
>
Then you are talking about standardizing a C-API. But I'm still
confused. If you write a class that implements these attributes,
is it your C-API that uses them, or do you mean our C-API uses
them? If you have your own C-API, then the attributes are not
relevant as an interface. If you intend to use our C-API to access
your objects, then they are. But if you want to use our C-API,
that still doesn't explain why the alternatives aren't acceptable
(namely subclassing).

>
> Because truthfully arrays are little more than a pointer to memory.
>
> That's like asking "why in the world would we presume memcpy() or
> qsort() would know what to do with your memory?"
>
Then you misunderstand Numarray. Numarrays are far more than just
a pointer to memory. You can get a pointer to memory from them,
but they entail much more than that. Numarray presumes that certain
things are possible with NumArray objects (like standard math
operations). If you want something that doesn't make such an
assumption, you should be using NDArray instead. NDArray makes
no presumptions about the contents of the memory other than
they are arranged in memory in array fashion.
>
> >
> > You haven't provided any example (let
> > alone a compelling one) of why we should accept any object that
> > provides those attributes.
> >
>
> Well, the UFuncs certainly should reject any object that they don't
> know how to handle.  I'm currently only addressing what it takes to be
> an NDArray/NumArray object.  OTOH, if I can present something to the
> UFuncs that looks like a known array type, why wouldn't UFuncs
> want to work with it?
>
If you are presenting numarray with a type is already knows about,
why aren't you subclassing it? If you present numarray an object
with a type it doesn't know about, then that is pointless.
Types and numarray are inextricably intertwined, and shall
remain so.
>
> - Allows me personally to distribute a separate (and simpler)
> implementation of NDArrays/NumArrays right now and have the same data
> objects work with yours when you're all done.  If I give the UFuncs a
> pointer to memory, and the attributes above, why shouldn't it work
> correctly?
>
>
> Am I doing any better?  I am trying.
>
Not really. More on that later.
>
>
> Is there a way, today, without modifying numarray, for me to use
> numarray as a holder for these esoteric data types?  Is that way
> difficult?
>  Could it be easier?
>
No to the first, it isn't intended to serve that purpose. If
you just need something to blindly hold values without doing
anything with them use NDArray (and you can add whatever customization
you wish regarding what methods or operators are available).

> I'm not asking numarray to know about my types in it's core baseline.  I'm
> wondering what it takes to implement new types at all.
>
It's possible to extend (but not in any way that makes it
automaticaly usable with anyone elses extension. Currently
that sort of extension would not be hard for someone that
knows how things work. We haven't documented how to do so,
and won't for a while. It's not a high priority for us now.

**********************************************************

What I want to see is a specific example. I'm not going to
pay much attention to generalities becasue I'm still unclear
about how you intend to do what you say you will do. Perhaps
I'm slow, but I still don't get it.

On the one hand, you ask us to have numarray accept objects
with the same 'interface'. Well, if they are not of an existing
supported type, thats pointless since numarray won't work
properly with them. If it is an existing type, you haven't
explained why you can't use numarray directly (or alternatively,
create a numarray object that uses the same buffer yours does).
I still haven't seen a specific example that illustrates why
you cannot use subclassing or an instance of a numarray object
instead. If you need to add a new type that's possible but
you'll have to spend some time figuring out how to do that for
your own extended version. If you just want to use arrays
to hold values (of new types), then use NDArray. It doesn't
care about types. But please give a specific case. E.g., "I want
complex ints and I will develop a class that will use this to
do the following things [it doesn't have to be exhastive or
complete, but include just enough to illustrate the point].
If the attributes were standardized then I would do this and that,
and use it with your stuff like this showing you the code
(and the behavior I expect)."

Given this I can either show you an alternate solution or
I can realize why you are right and we can discuss where
to go from there. Otherwise you are wasting your time.

Perry


From xscottg at yahoo.com  Sun Apr 14 21:10:12 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sun Apr 14 21:10:12 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCIEFICNAA.perry@stsci.edu>
Message-ID: <20020415040923.5808.qmail@web12903.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:

*** Just skim through my first few responses.  About half way through
writing this letter, a few things hit me.  I still want to propose some
changes, but I don't think you'll find them as intrusive...


>
> >
> > Then anyone who implemented these could work with the same C API for
> > getting the pointer to memory, shape array, stride array, and item
> > size.
> >
> Then you are talking about standardizing a C-API. But I'm still
> confused. If you write a class that implements these attributes,
> is it your C-API that uses them, or do you mean our C-API uses
> them?
>

I'm not really talking about standardizing a C-API.  I'm talking about
standardizing what that C-API would have to do.  You would have your 
C-API as part of numarray proper.  And, for the short term, I would have
my own C-API as part of what I need to get done.

Both C-API's would use the same attributes.

Why do I want my own C-API today?  Because numarray isn't done yet, and
I can't create arrays of the types I need.  I'll need a C-API to get at
my types.  It would be great if the same C-API could get at yours too.


>
> If you have your own C-API, then the attributes are not
> relevant as an interface. If you intend to use our C-API to access
> your objects, then they are. 
>

Either C-API could access anything that looks like an NDArray.


>
> >
> > Because truthfully arrays are little more than a pointer to memory.
> >
> > That's like asking "why in the world would we presume memcpy() or
> > qsort() would know what to do with your memory?"
> >
>
> Then you misunderstand Numarray. Numarrays are far more than just
> a pointer to memory. You can get a pointer to memory from them,
> but they entail much more than that. Numarray presumes that certain
> things are possible with NumArray objects (like standard math
> operations). If you want something that doesn't make such an
> assumption, you should be using NDArray instead. NDArray makes
> no presumptions about the contents of the memory other than
> they are arranged in memory in array fashion.
>

I think I understand where you're coming from now.  

(BTW, I think some of our confusion comes from when I'm talking about
"Numarray" or "numarray" the package versus "NumArray" and 
"NDArray" the classes.)


*** Ok, I think there is light at the end of this tunnel...

I guess what I've been arguing for all along is something a lot like
an NDArray where I can specify the typecode (and possibly other things like
'endian' etc...), and that only NDArrays have a minimal set of standardized
attributes.


With this I can create extensions that will work with anything that
looks like an NDArray.  Your NDArrays from the numarray package, and
my NDArrays of crazy types.


I'm still left in the position of having to upcast an NDArray to a
full blown NumArray if I ever want to use my NDArrays in a routine
meant solely for NumArrays.  However this conversion isn't difficult,
and I think can do that when needed.


Important Question:  If an NDArray had a typecode (and it was a known
string), is it possible to promote it to one of the standard NumArray
types?

Lesser Question:  If an NDArray had a known typecode, is it desirable
for numarray routines to promote the NDArray to a NumArray in the same
way that the routines promote a Python list or tuple to a NumArray on
the fly?


Ok, my new proposal (again, treat it like a suggestion):

- Do you think it would be possible to standardize the set of attributes
that it requires to be an NDArray?  NDArrays are simple and unlikely to
change.  I think _those_ really are just pointers to memory with array
accounting information.  We could agree on what exactly constitutes an
NDArray.

- Could this standard set of attributes optionally include the names for
the typecode, endian, (and maybe some other) attributes?


That doesn't mean that your NDArrays would have to have the typecode,
endian or whatever information.  It just means that when any class does
add a typecode, it adds it as a specially named attribute.


I realize that a large part of what I want is interoperability between
separate implementations of NDArrays.


Anything that has (_data, _shape, _itemsize, _type) is something I could
work with in an extension.  Some other fields are optional (_strides,
_byteoffset) because they have sensible defaults that can be calculated
from above in the common case.

So the only difference between what you currently have and most of what
I'm proposing is that the names of NDArray attributes become standardized.


>
> If you are presenting numarray with a type it already knows about,
> why aren't you subclassing it?
>

Since I know I'll have to create types that numarray doesn't know
about, I know I'm going to have to write a new array class (it's
already written).

It would be silly of my new array class to not implement the standard
types just because numarray _does_ know about them.

I now realize that I don't have to give my class to numarray directly. 
That didn't hit me before.  I could promote/upcast it when necessary.
The upcast-in and downcast-out thing will add up to extra work and
messier code, but it is a workaround.


>
> If you present numarray an object
> with a type it doesn't know about, then that is pointless.
> Types and numarray are inextricably intertwined, and shall
> remain so.
>

Understood.  I don't want to ruin your NumArrays.


> 
> **********************************************************
> 
> What I want to see is a specific example. I'm not going to
> pay much attention to generalities because I'm still unclear
> about how you intend to do what you say you will do. Perhaps
> I'm slow, but I still don't get it.
> 

Nope, clearly it was me that was being slow.  

There is still that bit about NDArrays that I'm trying to justify, so my
example is below.


>
> (or alternatively,
> create a numarray object that uses the same buffer yours does).
>

You're right.  This hadn't occurred to me until just a little bit ago.


>
> E.g., "I want
> complex ints and I will develop a class that will use this to
> do the following things [it doesn't have to be exhaustive or
> complete, but include just enough to illustrate the point].
> If the attributes were standardized then I would do this and that,
> and use it with your stuff like this showing you the code
> (and the behavior I expect)."
>

Here goes (somewhat hypothetical, but close to the boat I'm currently in):

Jon is our FPGA guy who makes screaming fast core files, but our FPGAs
don't do floating point.  So I have to provide his driver with ComplexInt16
data.

Jon and I write an extension module that calls his driver and reads data. 
We also write a C routine (call it "munge") that takes both ComplexInt16
data, and ComplexFloat64 data.  We try it out for testing, and pass in my
arrays in both places.  We could have used Numarray for the ComplexFloat64,
but that meant we had to use two array packages, and use two C-APIs in our
extension.  All we needed was a pointer to an array of doubles, so we stuck
with mine.

Ok, that part of development is done.  Now we present it to the application
developers.  Their happy and we're rolling.  Successful application.

Another group find out about this and they want to use it.  They're using
numarray for a large part of their application.  In fact, their calculating
the ComplexFloat64 half the data that they want to pass to my "munge"
routine using numarray, and they still need to use my ComplexInt32 data to
read the FPGA.

They're going to be disappointed to find out my extension can't read
numarray data, and that they have to convert back and forth between the
two.  And as the list of routines grow, they have to keep track of whether
it is a numarray-routine, or a scottarray-routine.

It's not so bad for one simple "munge" function, but there are going to be
hundreds of functions...

I don't expect you to have much sympathy for my having to convert data back
and forth between my array types and yours, but it is an avoidable problem.


For the most part, we both agree on what parts an NDArray should have.  If
we could only agree what to name them, and that we'd stick to those names,
that would be a large part of it for me.


>
> Given this I can either show you an alternate solution or
> I can realize why you are right and we can discuss where
> to go from there. Otherwise you are wasting your time.
>


Cheers,
    -Scott


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From jmiller at stsci.edu  Mon Apr 15 11:19:09 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Mon Apr 15 11:19:09 2002
Subject: [Numpy-discussion] ANN: Numarray-0.3.1 and 0.3.2
Message-ID: <3CBB1955.1010800@stsci.edu>

Numarray 0.3.1 and 0.3.2
---------------------------------
Numarray is an array processing package designed to efficiently
manipulate large multi-dimensional arrays.  Numarray is modelled after
Numeric and features c-code generated from python template scripts,
the capacity to operate directly on arrays in files, and improved type
promotions.

Numarray-0.3.1 incorporates a number of bug fixes and enhancements to
the C-API, including a minimal Numeric emulation layer which makes it
easy to port simple Numeric C-extensions to numarray.  The emulation
layer is incomplete, so not all Numeric extensions will work, but
simple ones *do* with a minimal amount of effort.  See
Doc/numpy_compat for an example of convolution done using the
emulation layer.  New for Numarray-0.3.1 is the Numarray manual in PDF
and HTML formats;  other formats are available for users if the source
distribution.

Numarray-0.3.2 is a source only release to support Alpha/Tru64.  It is
essentially Numarray-0.3.1 + one portability bug fix.

WHERE
-----------
Numarray-0.3.1 windows executable installers and source code tar ball is 
here:

http://sourceforge.net/project/showfiles.php?group_id=1369

Numarray is hosted by Source Forge in the same project which hosts Numeric:

http://sourceforge.net/projects/numpy/

The web page for Numarray information is at:

http://stsdas.stsci.edu/numarray/index.html

Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at
the Source Forge project for NumPy at:

http://sourceforge.net/tracker/?group_id=1369

REQUIREMENTS
--------------------------

numarray-0.3.1 requires Python 2.0 or greater.


AUTHORS, LICENSE
------------------------------

Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC
Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science
Institute.  Thanks go to Jochen Kupper of the University of North
Carolina for his work on Numarray and for porting the Numarray manual
to TeX format.

Numarray is made available under a BSD-style License.  See
LICENSE.txt in the source distribution for details.

-- 
Todd Miller 			jmiller at stsci.edu 


From perry at stsci.edu  Mon Apr 15 14:20:01 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Mon Apr 15 14:20:01 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020415040923.5808.qmail@web12903.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFAEJGDNAA.perry@stsci.edu>

Hi Scott,

I'm not going to respond to all points but mainly concentrate on the
last section.
>

>
> Important Question:  If an NDArray had a typecode (and it was a known
> string), is it possible to promote it to one of the standard NumArray
> types?
>
I think we want to avoid NDArray having any type attribute (Some types
have subtypes and then the issue gets really messy). We leave it
to the subclass to address how types will be handled.

> Here goes (somewhat hypothetical, but close to the boat I'm currently in):
>
> Jon is our FPGA guy who makes screaming fast core files, but our FPGAs
> don't do floating point.  So I have to provide his driver with
> ComplexInt16
> data.
>
> Jon and I write an extension module that calls his driver and reads data.
> We also write a C routine (call it "munge") that takes both ComplexInt16
> data, and ComplexFloat64 data.  We try it out for testing, and pass in my
> arrays in both places.  We could have used Numarray for the
> ComplexFloat64,
> but that meant we had to use two array packages, and use two C-APIs in our
> extension.  All we needed was a pointer to an array of doubles,
> so we stuck
> with mine.
>
> Ok, that part of development is done.  Now we present it to the
> application
> developers.  Their happy and we're rolling.  Successful application.
>
> Another group find out about this and they want to use it.  They're using
> numarray for a large part of their application.  In fact, their
> calculating
> the ComplexFloat64 half the data that they want to pass to my "munge"
> routine using numarray, and they still need to use my ComplexInt32 data to
> read the FPGA.
>
> They're going to be disappointed to find out my extension can't read
> numarray data, and that they have to convert back and forth between the
> two.  And as the list of routines grow, they have to keep track of whether
> it is a numarray-routine, or a scottarray-routine.
>
> It's not so bad for one simple "munge" function, but there are going to be
> hundreds of functions...
>
> I don't expect you to have much sympathy for my having to convert
> data back
> and forth between my array types and yours, but it is an
> avoidable problem.
>
>
>
> For the most part, we both agree on what parts an NDArray should have.  If
> we could only agree what to name them, and that we'd stick to those names,
> that would be a large part of it for me.
>
>
I'm not sure I understand the problem in all the details I need to.
I'll restate it as best as I understand it and you can tell me if
I understood incorrectly.

You have extension modules that get complex int data from hardware.
Other processing may be done to the complex int data in that format
so it doesn't make sense to convert it to a more standard format when
reading it in. You have C extensions that carry out certain tasks
on complex data (in either complex int format or complex floats).
You have users that would like to use your routine with numarray.
(I haven't seen any specific mention of the need for ufuncs on
complex ints so I'll assume you just need complex int arrays as
containers for C programs to use.)

[If you did need to perform ufuncs on complex ints, then extending
numarray locally to handle them would be one possibility, but a little
involved at the moment (a little easier later when we reimplement
complex), then again, maybe not, the complex stuff is currently
subclassed from numarray and not that hard to adapt to ints I think,
but it isn't that well done now].

I guess my initial reaction is that you should develop a front-
end C-API that handles obtaining data buffers from different
sources.  You get to define what kinds of things it supports,
and changes to either the list of types you support and localizes
any dependencies on our or anyone else's api to a small section of
code. From what I'm hearing, you don't need it to provide much
(pointer to arrays and associated information). If we are real
bozos and change the interface, it doesn't hurt you much (not that
we intend to be bozos or change the C-API willy nilly :-)

To elaborate, you define your equivalent of our getNumInfo routine

I don't think I've seen anything that requires explicit dependencies
on Python attributes. Sure, you could use the same attribute names
and use Python calls to get those just as our getNumInfo routine does,
but I think that is bad practice. You may find some other representation
for arrays out there that doesn't fit this model and you may want to
work with those also and you won't be able to get them to adopt our
scheme.

You say that you don't want your users to have to convert between
the two data representations. If they are using your C extensions
that is understandable, and avoidable since you've written your
programs to deal with the various types. On the other hand,
unless you extend numarray, numarray clearly cannot deal with
the complex ints so conversion is necessary. But understandably,
you would like to eliminate the need for explicit conversions.
I think there is an easy way of dealing with this.

We haven't implemented this capability yet but we've been talking
about having numarray check input values to see if they have
a method "tonumarray" [not that we would choose that particular
method name, I'm just illustrating the point]. If that
method did exist, it would be called to create a numarray
from the object. Thus you could add such a method to your
class and when it is used in numarray ufuncs or in binary operations
with numarray objects, your complex ints are automatically
converted to numarray objects (presumably a complex float of
some precision). Adding this capability to numarray should be
pretty easy.

True, the solution that I proposed doesn't protect you from making
any changes ever. But we believe we are at a stage in the project
where it is dangerous to lock ourselves into lower level details
such as the internal description of the array. We still have things
to implement and that may cause us to realize that some changes
are needed. Our C-API stuff is relatively new. It may see changes
in the near future, but likely not many related to what you need.
And we intend to shield the C-API from changes in the Python
attributes. We could change the name or contents of _byteswap and
it would not change anything in the C-API. I see premature
coupling of low level implementation details as a bad thing,
not a good thing. Any change that are made to the API require
changes only the corresponding routine in your C-API, and all
your C applications are shielded from any changes (save rebuilding).

If I've misunderstood your examples, please let me know.

Perry


From xscottg at yahoo.com  Mon Apr 15 15:33:10 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr 15 15:33:10 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFAEJGDNAA.perry@stsci.edu>
Message-ID: <20020415223223.5901.qmail@web12905.mail.yahoo.com>

Hi Perry.

Well, I don't think I've made any progress convincing you that
standardizing what it means to be an interoperable "NDArray"
would be good for me or others in the community, but I do
appreciate you letting me try.


I'll take your suggestion and make my C-API understand a superset
of array types.  I'll wait to see how the tonumarray() thing pans
out.  That might meet all of my practical concerns even if I don't
think it is as elegant of a solution as defining a strong interface.


I'll just respond to the one point below.  If I had to sum up my
argument for why I think separate array implementations could 
(should) be compatible, it is buried in the answer to this question.


>
> >
> > Important Question:  If an NDArray had a typecode (and it was a known
> > string), is it possible to promote it to one of the standard NumArray
> > types?
> >
>
> I think we want to avoid NDArray having any type attribute (Some types
> have subtypes and then the issue gets really messy). We leave it
> to the subclass to address how types will be handled.
> 

Ok that's what you're currently doing, but let me rephrase the question.

  :-)


Given a "leaf type" -- something that is really well specified and very
similar on all modern platforms:

    "Int32"    - not just an arbitrary "Int"
    "Float64"  - not just an arbitrary "Float")


Do you think you could write a general purpose _function_ that converted an
"NDArray" to a full featured "NumArray"?  I know this would be in Python,
but let's pretend it's a C++ prototype to make the types clear:


NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) {
    if (WellKnownLeafTypecodeString(typecode)) {

        /* fill in the blanks here */

        return NumArray(result)
    }

    throw "conversion really is impossible";
}


Cheers and thanks again for your time,
    -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Tue Apr 16 08:15:09 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Tue Apr 16 08:15:09 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020415223223.5901.qmail@web12905.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFAEJKDNAA.perry@stsci.edu>

> > > Important Question:  If an NDArray had a typecode (and it was a known
> > > string), is it possible to promote it to one of the standard NumArray
> > > types?
> > >
> >
> > I think we want to avoid NDArray having any type attribute (Some types
> > have subtypes and then the issue gets really messy). We leave it
> > to the subclass to address how types will be handled.
> > 
> 
> Ok that's what you're currently doing, but let me rephrase the question.
> 
>   :-)
> 
> 
> Given a "leaf type" -- something that is really well specified and very
> similar on all modern platforms:
> 
>     "Int32"    - not just an arbitrary "Int"
>     "Float64"  - not just an arbitrary "Float")
> 
> 
> Do you think you could write a general purpose _function_ that 
> converted an
> "NDArray" to a full featured "NumArray"?  I know this would be in Python,
> but let's pretend it's a C++ prototype to make the types clear:
> 
> 
> NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) {
>     if (WellKnownLeafTypecodeString(typecode)) {
> 
>         /* fill in the blanks here */
> 
>         return NumArray(result)
>     }
> 
>     throw "conversion really is impossible";
> }
> 

I'm not sure I understand exactly what you are trying to do here, but
I try to address the question as best I can.

If one had an NDArray that happened to contain a type that numarray 
supported, yes it is possible (in fact RecArray does that sort of thing).

If your point is that in doing so one must use the private attributes
such as _strides, yes that is true. These attributes are private in 
the sense that users of instances of these objects should never have
cause to access them. But it does not mean that classes that subclass
NDArray or any of its subclasses, should not access them. They are not
private in the sense of the class family (one reason we didn't use
__strides since that mechanism is  not usable (easily anyway) for 
subclasses. In that sense, the attributes form an interface within 
the class family. Some class extenders may need to access them, sure.

Perry 


From omar.mekkaoui at eco.u-cergy.fr  Tue Apr 16 11:04:38 2002
From: omar.mekkaoui at eco.u-cergy.fr (mekkaoui)
Date: Tue Apr 16 11:04:38 2002
Subject: [Numpy-discussion] Extension under windows
Message-ID: <3CBC68D9.64B7C425@eco.u-cergy.fr>

Dear Numerical Python Users,

I have writen an extension using  GSL (Gnu Scientific Library) and
Numerical Python.
This extension work fine under Linux and I would to do the same under
Windows. For that I use Cygwin.
When I would create the module

$ gcc -shared Example.o -o Example.pyd

I receive this message :


Example.o<.text+0x58>:Example.c: undefined reference to
'PyArg_ParseTuple'
Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue'

Example.o<.text+0x1b1>:Example.c: undefined reference to
'Py_InitModule4'
Example.o<.text+0x1c1>:Example.c: undefined reference to
'PyImport_ImportModule'
Example.o<.text+0x1db>:Example.c: undefined reference to
'PyModule_GetDict'
Example.o<.text+0x1f4>:Example.c: undefined reference to
'PyDict_GetItemString'
Example.o<.text+0x206>:Example.c: undefined reference to
'PyCObject_Type'
Example.o<.text+0x214>:Example.c: undefined reference to
'PyCObject_AsVoidPtr'

Perhaps this command is wrong.

Perhaps, anyone could explain or show me a document which explain the
procedure clearly ?

Thanks in advance for your help

Omar


From xscottg at yahoo.com  Tue Apr 16 16:38:02 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Tue Apr 16 16:38:02 2002
Subject: [Numpy-discussion] Introduction
Message-ID: <20020416233700.72472.qmail@web12904.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:
> 
> If one had an NDArray that happened to contain a type that numarray 
> supported, yes it is possible (in fact RecArray does that sort of thing).
> 
> If your point is that in doing so one must use the private attributes
> such as _strides, yes that is true.
>

My point was simply:

  = One *can* convert from (NDArray + typecode) to a full NumArray
  = You *do* already convert lists, tuples, ... to NumArrays in ufuncs
  = So you *could* convert *(NDArrays + typecode) to NumArrays in ufuncs
    in the same place that checks to see if it is a list, tuple, ...

Therefore:

  = You possibly *could* standardize the attributes in an NDArray
      (buffer, typecode, shape, stride, offset, ...)
  = If you *did* standardize the attributes, then others *could*
    build UserDefinedNDArrays however they see fit and they would
    work with NumArrays


However I get the sense that the numarray module is your baby, and you
don't want to change him too much.  That's very understandable, you're a
proud parent.  Truth be told, he's a good looking kid, and I look forward
to hanging out with him when he's all grown up.  We just have a little
different view on parenting, and I was hoping my kid would have an easier
time playing with yours.


Now that I've beaten that silly metaphor to death...  :-)


Cheers,
    -Scott


ps: It occurs to me, with the strong sense of encapsulation you desire,
that I could have presented this better as requesting that you specify a
set of standard *methods* instead of attributes.  Something like:

     def __array_getbuffer__(self):
     def __array_getoffset__(self):
     def __array_getshape__(self):
     def __array_getstrides__(self):
     def __array_getitemsize__(self):
     def __array_gettypecode__(self):
     def __array_getendian__(self):
     # Who knows what the real list would consist of...
     # We never got to discuss what a really general
     # purpose description of an NDArray would require...


Then anything which implemented those standard *methods* would be a viable
NDArray.  From my point of view it amounts to about the same thing, but I
think it's a better design and that you might like this idea more.


However I'm getting out of breath on this topic, and I have other things
I need to do (I'm sure this is true for you too), so if you don't see any
merit in this idea, I won't push for it any further.

Cheers again.


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Tue Apr 16 17:52:03 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Tue Apr 16 17:52:03 2002
Subject: [Numpy-discussion] Conclusion
In-Reply-To: <20020416233700.72472.qmail@web12904.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCCEFLCNAA.perry@stsci.edu>

After Scott's last display of his powers of persuasion,
I lack for a meaningful response. It seems appropriate to
declare this thread closed.

Besides, I've got to go change some diapers ;-) 

Perry


From paul at pfdubois.com  Wed Apr 17 07:17:07 2002
From: paul at pfdubois.com (Paul F Dubois)
Date: Wed Apr 17 07:17:07 2002
Subject: [Numpy-discussion] Extension under windows
In-Reply-To: <3CBC68D9.64B7C425@eco.u-cergy.fr>
Message-ID: <000301c1e61a$20c2a590$0a01a8c0@NICKLEBY>

You need to link with the Python library. I suggest you learn to use
distutils and then it will load for you correctly on both platforms. The
file "setup.py" in the Numeric source distribution is a good if
complicated example. Some of the setup.py files in the Packages area are
simpler and easier to understand.

-----Original Message-----
From: numpy-discussion-admin at lists.sourceforge.net
[mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of
mekkaoui
Sent: Tuesday, April 16, 2002 11:09 AM
To: numpy-discussion at lists.sourceforge.net
Subject: [Numpy-discussion] Extension under windows


Dear Numerical Python Users,

I have writen an extension using  GSL (Gnu Scientific Library) and
Numerical Python. This extension work fine under Linux and I would to do
the same under Windows. For that I use Cygwin. When I would create the
module

$ gcc -shared Example.o -o Example.pyd

I receive this message :


Example.o<.text+0x58>:Example.c: undefined reference to
'PyArg_ParseTuple'
Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue'

Example.o<.text+0x1b1>:Example.c: undefined reference to
'Py_InitModule4'
Example.o<.text+0x1c1>:Example.c: undefined reference to
'PyImport_ImportModule'
Example.o<.text+0x1db>:Example.c: undefined reference to
'PyModule_GetDict'
Example.o<.text+0x1f4>:Example.c: undefined reference to
'PyDict_GetItemString'
Example.o<.text+0x206>:Example.c: undefined reference to
'PyCObject_Type'
Example.o<.text+0x214>:Example.c: undefined reference to
'PyCObject_AsVoidPtr'

Perhaps this command is wrong.

Perhaps, anyone could explain or show me a document which explain the
procedure clearly ?

Thanks in advance for your help

Omar


_______________________________________________
Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


From magnus at hetland.org  Wed Apr 17 07:32:31 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Wed Apr 17 07:32:31 2002
Subject: [Numpy-discussion] Graphs in numarray?
Message-ID: <20020417163133.F7565@idi.ntnu.no>

I'm looking at various ways of implementing graphs in Python (beyond
simple dict-based stuff -- more performance is needed). kjbuckets
looks like a nice alternative, as does the Boost Graph Library (not
sure how easy it is to use with Boost.Python) but if numarray is to
become a part of the standard library, it could be beneficial to use
that...

For dense graphs, it makes sense to use an adjacency matrix directly
in numarray, I should think. (I haven't implemented many graph
algorithms with ufuncs yet, but it seems doable...) For sparse graphs
I guess some sort of sparse array implementation would be useful,
although the archives indicate that creating such a thing isn't a core
part of the numarray project.

What do you think -- is it reasonable to use numarray for graph
algorithms? Perhaps an additional module with standard graph
algorithms would be interesting? (I'm sure I could contribute some if
there is any interest...)

And -- is there any chance of getting sparse matrices in numarray?

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From perry at stsci.edu  Wed Apr 17 12:10:32 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr 17 12:10:32 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020417163133.F7565@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFOEKGDNAA.perry@stsci.edu>

Hi Magnus,

On Behalf Of Magnus Lie Hetland

> 
> I'm looking at various ways of implementing graphs in Python (beyond
> simple dict-based stuff -- more performance is needed). kjbuckets
> looks like a nice alternative, as does the Boost Graph Library (not
> sure how easy it is to use with Boost.Python) but if numarray is to
> become a part of the standard library, it could be beneficial to use
> that...
> 
> For dense graphs, it makes sense to use an adjacency matrix directly
> in numarray, I should think. (I haven't implemented many graph
> algorithms with ufuncs yet, but it seems doable...) For sparse graphs
> I guess some sort of sparse array implementation would be useful,
> although the archives indicate that creating such a thing isn't a core
> part of the numarray project.
> 
First of all, it may make sense, but I should say a few words about
what scale sizes make sense. Currently numarray is implemented mostly
in Python (excepting the very low level, very simple C functions
that do the computational and indexing loops. This means it currently
has a pretty sizable overhead to set up an array operation (I'm
guessing an order of magnitude slower than Numeric). Once set up,
it generally is pretty fast. So it is pretty good for very large
data sets. Very lousy for very small ones. We haven't measured
efficiency lately (we are deferring optimization until we have all
the major functionality present first), but I wouldn't be at all
surprised to find that the set up time can be equal to the time
to actually process ~10,000-20,000 elements (i.e., the time spent per
element for a 10K array is roughly half that for much larger arrays.

So if you are working with much smaller arrays than 10K, you won't
see total execution time decrease much (it was already spending 
half its time in setup, which doesn't change). We would like
to reduce this size threshhold in the future, either by optimizing the
Python code, or moving some of it into C. This optimization wouldn't
be for at least a couple more months; we have more urgent features
to deal with. I doubt that we will ever surpass the current Numeric
in its performance on small arrays (though who knows, perhaps we 
can come close).

> What do you think -- is it reasonable to use numarray for graph
> algorithms? Perhaps an additional module with standard graph
> algorithms would be interesting? (I'm sure I could contribute some if
> there is any interest...)
> 
Before I go further, I need to find out if the preceeding has made
you gasp in horror or if the timescale is too slow for you to
accept. (This particular issue also makes me wonder if numarray would
ever be a suitable substitute for the existing array module).
What size graphs are you most concerned about as far as speed goes?

> And -- is there any chance of getting sparse matrices in numarray?
> 
Since talk is cheap, yes :-). But I doubt it would be in the "core"
and some thought would have to be given to how best to represent them.
In one sense, since the underlying storage is different than numarray
assumes for all its arrays, sparse arrays don't really share the
same underlying C machinery very well. While it certainly would be
possible to devise a class with the same interface as numarray objects,
the implementation may have to be completely different. 

On the other hand, since numarray has much better support for index
arrays, i.e., an array of indices that may be used to index another
array of values,  index array(s), value array pair may itself serve
as a storage model for sparse arrays. One still needs to implement
ufuncs and other functions (including simple things like indexing)
using different machinery. It is something that would be nice to have,
but I can't say when we would get around to it and don't want to
raise hopes about how quickly it would appear.

Perry


From victor at idaccr.org  Wed Apr 17 15:25:24 2002
From: victor at idaccr.org (Victor S. Miller)
Date: Wed Apr 17 15:25:24 2002
Subject: [Numpy-discussion] The right way to use results of argmax and argmin
Message-ID: <ulwuv63qwd.fsf@runner.princeton.idaccr.org>

# I'm running python 2.0 on Solaris and Numeric 21.0
#I have an m by n array -- called a and have
# j an n long list of integers in range(m), such as

j = argmax(a,0)

# If I set
z = zip(j,range(len(j)))

# and try the statement

res = take(a,z)

# python appears to hang, but if I do

res = array(map(lambda x,a=a: a[x[0],x[1]]],z)

# It works.

# Is there a simpler way of doing what I want, and why does take hang?
# is it, perhaps, allocating some n by n work array (this would
# probably make things thrash like crazy)?


-- 
Victor S. Miller     | " ... Meanwhile, those of us who can compute can hardly
victor at idaccr.org    | be expected to keep writing papers saying 'I can do the
CCR, Princeton, NJ   | following useless calculation in 2 seconds', and indeed
    08540 USA        | what editor would publish them?"  -- Oliver Atkin


From magnus at hetland.org  Thu Apr 18 07:55:19 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 18 07:55:19 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFOEKGDNAA.perry@stsci.edu>; from perry@stsci.edu on Wed, Apr 17, 2002 at 03:06:12PM -0400
References: <20020417163133.F7565@idi.ntnu.no> <JFEGLNDJEDNOMPPHDEJFOEKGDNAA.perry@stsci.edu>
Message-ID: <20020418165403.E300@idi.ntnu.no>

Perry Greenfield <perry at stsci.edu>:
[snip]
> First of all, it may make sense, but I should say a few words about
> what scale sizes make sense.
[snip]
> So if you are working with much smaller arrays than 10K, you won't
> see total execution time decrease much

In relation to what? Using dictionaries etc? Using the array module?
[snip]
> Before I go further, I need to find out if the preceeding has made
> you gasp in horror or if the timescale is too slow for you to
> accept.

Hm. If you need 10000 elements before numarray pays off, I'm starting
to wonder if I can use it for anything at all. :I

> (This particular issue also makes me wonder if numarray would
> ever be a suitable substitute for the existing array module).

Indeed.

> What size graphs are you most concerned about as far as speed goes?

I'm not sure. A wide range, I should imagine. But with only 100 nodes,
I'll get 10000 entries in the adjacency matrix, so perhaps it's
worthwile anyway?

> > And -- is there any chance of getting sparse matrices in numarray?
>
> Since talk is cheap, yes :-). But I doubt it would be in the "core"
> and some thought would have to be given to how best to represent them.
> In one sense, since the underlying storage is different than numarray
> assumes for all its arrays, sparse arrays don't really share the
> same underlying C machinery very well. While it certainly would be
> possible to devise a class with the same interface as numarray objects,
> the implementation may have to be completely different. 

Yes, I realise that.

> On the other hand, since numarray has much better support for index
> arrays, i.e., an array of indices that may be used to index another
> array of values,  index array(s), value array pair may itself serve
> as a storage model for sparse arrays.

That's an interesting idea, although I don't quite see how it would
help in the case of adjacency matrices. (You'd still need at least one
n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
Right?)

> One still needs to implement ufuncs and other functions (including
> simple things like indexing) using different machinery. It is
> something that would be nice to have, but I can't say when we would
> get around to it and don't want to raise hopes about how quickly it
> would appear.

No - no problem.

Basically, I'm looking for a platform to implement graph algorithms
that doesn't necessitate too many installed packages etc. numarray
seemed promising since it's a candidate for inclusion in the standard
library. I guess I'll just have to do some timing experiments...

> Perry

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From perry at stsci.edu  Thu Apr 18 08:22:06 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 18 08:22:06 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020418165403.E300@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu>


> Behalf Of Magnus Lie Hetland
> Perry Greenfield <perry at stsci.edu>:
> [snip]
> > First of all, it may make sense, but I should say a few words about
> > what scale sizes make sense.
> [snip]
> > So if you are working with much smaller arrays than 10K, you won't
> > see total execution time decrease much
> 
> In relation to what? Using dictionaries etc? Using the array module?

No, in relation to operations on a 10K array. Basically, if an operation
on a 10K array spends half its time on set up, operations on a
10 element array may only be twice as fast. I'm not making any claims
about speed in relation to any other data structure (other than Numeric)

> [snip]
> > Before I go further, I need to find out if the preceeding has made
> > you gasp in horror or if the timescale is too slow for you to
> > accept.
> 
> Hm. If you need 10000 elements before numarray pays off, I'm starting
> to wonder if I can use it for anything at all. :I
> 
I didn't make clear that this threshold may improve in the future
(decrease). The corresponding threshold for Numeric is probably 
around 1000 to 2000 elements. (Likewise, operations on 10 element
Numeric arrays are only about twice as fast as for 1K arrays)
We may be able to eventually improve numarray performance to something 
in that neighborhood (if we are luckly) but I would be surprised to
do much better (though if we use caching techniques, perhaps repeated
cases of arrays of identical shape, strides, type, etc. may run
much faster on subsequent operations). As usual, performance issues
can be complicated. You have to keep in mind that Numeric and numarray
provide much richer indexing and conversion handling feature than 
something like the array module, and that comes at some price in
performance for small arrays.

> > (This particular issue also makes me wonder if numarray would
> > ever be a suitable substitute for the existing array module).
> 
> Indeed.
> 
> > What size graphs are you most concerned about as far as speed goes?
> 
> I'm not sure. A wide range, I should imagine. But with only 100 nodes,
> I'll get 10000 entries in the adjacency matrix, so perhaps it's
> worthwile anyway?
> 
That's right, a 100 nodes is where performance is being competitive,
and if you feel you are worried about cases larger than that, then
it isn't a problem. But if you are operating mostly on small graphs,
then it may not be appropriate. The corresponding threshold for numeric
would be on the order of 30 nodes.

> > On the other hand, since numarray has much better support for index
> > arrays, i.e., an array of indices that may be used to index another
> > array of values,  index array(s), value array pair may itself serve
> > as a storage model for sparse arrays.
> 
> That's an interesting idea, although I don't quite see how it would
> help in the case of adjacency matrices. (You'd still need at least one
> n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
> Right?)
> 
Right.

> 


From magnus at hetland.org  Thu Apr 18 08:48:17 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 18 08:48:17 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu>; from perry@stsci.edu on Thu, Apr 18, 2002 at 11:21:46AM -0400
References: <20020418165403.E300@idi.ntnu.no> <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu>
Message-ID: <20020418174733.A7072@idi.ntnu.no>

Perry Greenfield <perry at stsci.edu>:
>
[snip]
> > In relation to what? Using dictionaries etc? Using the array module?
> 
> No, in relation to operations on a 10K array. Basically, if an operation
> on a 10K array spends half its time on set up, operations on a
> 10 element array may only be twice as fast. I'm not making any claims
> about speed in relation to any other data structure (other than Numeric)
 
Aaah! Sorry to be so dense :)

But the speedup in numeric between different sizes isn't as important
to me as the speedup compared to other solutions (such as a dict-based
one) of course... If a 10 element array is only twice as fast as a 10K
array that's no problem if it's still faster than an alternative
solution (though I'm sure it might not be...)

The same goes for 10K element graphs -- the interesting point has to
be whether it's faster than various alternatives (which I'm sure it
is).

> > [snip]
> > > Before I go further, I need to find out if the preceeding has made
> > > you gasp in horror or if the timescale is too slow for you to
> > > accept.
> > 
> > Hm. If you need 10000 elements before numarray pays off, I'm starting
> > to wonder if I can use it for anything at all. :I
> > 
> I didn't make clear that this threshold may improve in the future
> (decrease).

Right. Good.

And -- on small graphs performance probably won't be much of a problem
anyway. :)

> The corresponding threshold for Numeric is probably 
> around 1000 to 2000 elements. (Likewise, operations on 10 element
> Numeric arrays are only about twice as fast as for 1K arrays)
> We may be able to eventually improve numarray performance to something 
> in that neighborhood (if we are luckly) but I would be surprised to
> do much better (though if we use caching techniques, perhaps repeated
> cases of arrays of identical shape, strides, type, etc. may run
> much faster on subsequent operations). As usual, performance issues
> can be complicated. You have to keep in mind that Numeric and numarray
> provide much richer indexing and conversion handling feature than 
> something like the array module, and that comes at some price in
> performance for small arrays.

Of course.

I guess an alternative (for the graph situation) could be to wrap the
graphs with a common interface with various implementations, so that a
solution more optimised for small graphs could be used (in a factory
function) if the graph is small... (Not really an issue for me at the
moment, but should be easy to do, I guess.)

[snip]
> > I'm not sure. A wide range, I should imagine. But with only 100 nodes,
> > I'll get 10000 entries in the adjacency matrix, so perhaps it's
> > worthwile anyway?
> > 
> That's right, a 100 nodes is where performance is being competitive,
> and if you feel you are worried about cases larger than that, then
> it isn't a problem.

Seems probable. For smaller problems I wouldn't be thinking in terms
of numarray anyway, I think. (Just using plain Python dicts or
something similar.)

[snip]
> > > On the other hand, since numarray has much better support for index
> > > arrays, i.e., an array of indices that may be used to index another
> > > array of values,  index array(s), value array pair may itself serve
> > > as a storage model for sparse arrays.
> > 
> > That's an interesting idea, although I don't quite see how it would
> > help in the case of adjacency matrices. (You'd still need at least one
> > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
> > Right?)
> > 
> Right.

I might as well use a full adjacency matrix, then...

So, the conclusion for now is that numarray may well be suited for
working with relatively large (100+ nodes), relatively dense graphs.

Now, the next interesting question is how much of the standard graph
algorithms can be implemented with ufuncs and array operations (which
I guess is the key to performance) and not straight for-loops... After
all, some of them are quite sequential.

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From rob at pythonemproject.com  Thu Apr 18 09:18:31 2002
From: rob at pythonemproject.com (rob)
Date: Thu Apr 18 09:18:31 2002
Subject: [Numpy-discussion] Graphs in numarray?
References: <20020418165403.E300@idi.ntnu.no> <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu> <20020418174733.A7072@idi.ntnu.no>
Message-ID: <3CBEF151.C440DCE@pythonemproject.com>

I'm sorry I missed the original post, but the topic is important for
me.  I use the lightweight 3d volume renderer Animabob for most
everything.  The interface code is in all of the FDTD programs in my
website.  You just unwind a 3d array and scale it to +/- 128, turn it
into chararacters, and you have the input file.  I wish Animabob could
somehow be turned into a Python package, as in Windows you need Cygwin
to run it.  I've tried other 3d packages like OpenDX, and they seem to
be huge albatrosses.


-- 
-----------------------------
The Numeric Python EM Project

www.pythonemproject.com


From perry at stsci.edu  Thu Apr 18 10:36:19 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 18 10:36:19 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <3CBEF151.C440DCE@pythonemproject.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFEEKMDNAA.perry@stsci.edu>


Behalf Of rob
 
> 
> I'm sorry I missed the original post, but the topic is important for
> me.  I use the lightweight 3d volume renderer Animabob for most
> everything.  The interface code is in all of the FDTD programs in my
> website.  You just unwind a 3d array and scale it to +/- 128, turn it
> into chararacters, and you have the input file.  I wish Animabob could
> somehow be turned into a Python package, as in Windows you need Cygwin
> to run it.  I've tried other 3d packages like OpenDX, and they seem to
> be huge albatrosses.
> 
It sound like you are trying to do something different than Magnus, but
if what you are looking to scale floating or int data to byte size and
apply some character mapping, numarray (or Numeric) should be able
to do that very well. If that is all you want done, you might find
either to be overkill though (if you already wrote a C extension to
do so).

Perry

 
From perry at stsci.edu  Thu Apr 18 10:39:03 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 18 10:39:03 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020418174733.A7072@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFIEKMDNAA.perry@stsci.edu>

 
> Now, the next interesting question is how much of the standard graph
> algorithms can be implemented with ufuncs and array operations (which
> I guess is the key to performance) and not straight for-loops... After
> all, some of them are quite sequential.
> 
I'm not sure about that (not being very familiar with graph algorithms).
If you can give me some examples (perhaps off the mailing list) I could
say whether they are easily cast into ufunc or library calls. 

Perry


From paul at pfdubois.com  Fri Apr 19 07:42:23 2002
From: paul at pfdubois.com (Paul F Dubois)
Date: Fri Apr 19 07:42:23 2002
Subject: [Numpy-discussion] [ANN] Pyfort 7.1
Message-ID: <000101c1e7ae$38ac0d50$0a01a8c0@NICKLEBY>

Pyfort 7.1 is available at sf.net/projects/pyfortran.

Support for single Fortran characters was added. (Michiel de Hoon)
Corrected behavior of scalars with C routines.   (Michiel de Hoon)

Pyfort is a tool for connecting Python to Fortran.

Just to let you know, I'm working on a little tool to make it easier to
set up simple projects so that you can build and install them with less
effort. I hope to have that available soon.


From rob at pythonemproject.com  Fri Apr 19 09:00:02 2002
From: rob at pythonemproject.com (rob)
Date: Fri Apr 19 09:00:02 2002
Subject: [Numpy-discussion] Icc compiled Python
Message-ID: <3CC03E5A.9835FC0A@pythonemproject.com>

There has been some discussion on the FreeBSD Ports list about an Icc
compiled Python.  Benchmarks much faster than the normal gcc compiled
version.  I'm wondering if anyone here knows anything about it.  The
discussion can be accessed via www.geocrawler.org/ FreeBSD/
freebsd-ports.

Rob.


-- 
-----------------------------
The Numeric Python EM Project

www.pythonemproject.com


From juenglin at informatik.uni-freiburg.de  Sat Apr 20 09:59:13 2002
From: juenglin at informatik.uni-freiburg.de (Ralf Juengling)
Date: Sat Apr 20 09:59:13 2002
Subject: [Numpy-discussion] NumPy initiated reference counting
Message-ID: <1019321875.8067.141.camel@leto>

I'm currently tinkering with the following problem and what like to
hear your suggestions:

Within a C module I define a new Python type 'IM' (representing an
image). 
The indexing or slicing facilities of NumPy arrays were tailormade 
for the manipulation of the internal data of its instances. Thus,
I could provide a method 'asarray', which creates a properly
typed array object 'a' referring to the data of an IM instance 'im':

a = im.asarray()

I could use PyArray_FromDimsAndData() to create the array instance.
Unfortunately, this wouldn't work, since 'a' would not get notified 
about the death of 'im'.
However, if I could prevent 'im' from being garbage collected before
all array instances referring to its data are deleted, it should work.

NumPy's array type uses a mechanism to prevent garbage collection
of array instances if there are other instances that share data with
it. My idea was, to use this mechanism, that is to let the asarray
method increment im's reference count and let a->base refer to im.

Do you think this is a reliable approach?

Thanks,
Ralf
  

-- 
--------------------------------------------------------------------------
Ralf J?ngling
Institut f?r Informatik - Lehrstuhl f?r Mustererkennung &
Bildverarbeitung
Georges-K?hler-Allee               
Geb?ude 52                                        Tel:
+49-(0)761-203-8215
79110 Freiburg                                    Fax:
+49-(0)761-203-8262
--------------------------------------------------------------------------


From juenglin at informatik.uni-freiburg.de  Sat Apr 20 12:22:51 2002
From: juenglin at informatik.uni-freiburg.de (Ralf Juengling)
Date: Sat Apr 20 12:22:51 2002
Subject: [Numpy-discussion] qs on NumPy
Message-ID: <1019330305.8067.158.camel@leto>

Hi,

I did not find a way in Python to check whether a Numeric array 
instance is a shared array or not. Could you confirm: there is no way.

Is there work underway to make Numeric arrays subclassable?

Regards,
Ralf

-- 
--------------------------------------------------------------------------
Ralf J?ngling
Institut f?r Informatik - Lehrstuhl f?r Mustererkennung &
Bildverarbeitung
Georges-K?hler-Allee               
Geb?ude 52                                        Tel:
+49-(0)761-203-8215
79110 Freiburg                                    Fax:
+49-(0)761-203-8262
--------------------------------------------------------------------------


From mok at imsb.au.dk  Tue Apr 23 04:21:03 2002
From: mok at imsb.au.dk (Morten Kjeldgaard)
Date: Tue Apr 23 04:21:03 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020417163133.F7565@idi.ntnu.no>
Message-ID: <Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>

> simple dict-based stuff -- more performance is needed). kjbuckets
> looks like a nice alternative, as does the Boost Graph Library (not

Kjbuckets is *very* nice indeed. It is a compact and very fast
implementation. I don't see why you'd want to wrap this functionality into
NumPy, which has a very well-defined scope and an efficient implentation.
It would be a shame to bloat it with something which is discretely 
different.

I have modified kjbuckets so that it compiles and works with Python 2.x. 
You can pick it up at 

ftp://xray.imsb.au.dk 
/pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm

Just do "rpm --rebuild" on it.

I sent the patch to the original author, but it appears he is no longer 
maintaining it. Never mind, it works great.

/Morten

-- 
Morten Kjeldgaard   <mok at imsb.au.dk>             | Phone : +45 89 42 50 26
Institute of Molecular and Structural Biology    | Fax   : +45 86 12 31 78
Aarhus University                                | Home  : +45 86 18 81 80
Gustav Wieds Vej 10 C, DK-8000 Aarhus C, Denmark | http://imsb.au.dk/~mok


From magnus at hetland.org  Thu Apr 25 07:28:05 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 25 07:28:05 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>; from mok@imsb.au.dk on Tue, Apr 23, 2002 at 01:20:04PM +0200
References: <20020417163133.F7565@idi.ntnu.no> <Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>
Message-ID: <20020425162734.B6821@idi.ntnu.no>

Morten Kjeldgaard <mok at imsb.au.dk>:
>
> 
> > simple dict-based stuff -- more performance is needed). kjbuckets
> > looks like a nice alternative, as does the Boost Graph Library (not
> 
> Kjbuckets is *very* nice indeed.

Yes, I guess it is. But the project doesn't seem very active...

> It is a compact and very fast implementation. I don't see why you'd
> want to wrap this functionality into NumPy, which has a very
> well-defined scope and an efficient implentation.  It would be a
> shame to bloat it with something which is discretely different.

Yes, I guess you're right. There is no point in adding this sort of
thing to numarray. My motivation for using numarray in my
implementations was simply that it would mean that the necessery tools
would be (or might be in the future ;) available in the standard
distribution.

> I have modified kjbuckets so that it compiles and works with Python 2.x. 
> You can pick it up at 
> 
> ftp://xray.imsb.au.dk 
> /pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm
> 
> Just do "rpm --rebuild" on it.
> 
> I sent the patch to the original author, but it appears he is no longer 
> maintaining it. Never mind, it works great.

Well... I do sort of mind... I'm a bit wary of using unmaintained
software. Not that I would never do it or anything... But I think it
would be a bonus to use stuff that is being actively maintained and
developed. But I guess I'll take another look at it.

(Any idea where the "kj" prefix comes from, by the way?)

> /Morten

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From magnus at hetland.org  Thu Apr 25 07:43:10 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 25 07:43:10 2002
Subject: [Numpy-discussion] Non-numeric arrays?
Message-ID: <20020425164228.C6821@idi.ntnu.no>

I can't find this in the docs (although I've heard it's mentioned
there)... Is support for non-numeric arrays (such as character arrays
or object pointer arrays) as in Numeric planned for numarray? (Perhaps
even supported? My version might not be themost recent...)

And what about subclasses of numeric types?

E.g:

# numarray
>>> class foo(int): pass
>>> a = array(map(foo, xrange(10)))
[...]
TypeError: Expecting a python numeric type, got a foo

# Numeric
>>> class foo(int): pass
>>> a = array(map(foo, xrange(10)))
>>> tupe(a[0])
<type 'int'>

Neither behaviour seems very helpful -- I guess numarray's is
cleaner... (Although in this case I think an object array could have
been nice...)

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From jmiller at stsci.edu  Thu Apr 25 07:53:04 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 25 07:53:04 2002
Subject: [Numpy-discussion] Non-numeric arrays?
References: <20020425164228.C6821@idi.ntnu.no>
Message-ID: <3CC81814.2010702@stsci.edu>

Magnus Lie Hetland wrote:

>I can't find this in the docs (although I've heard it's mentioned
>there)... Is support for non-numeric arrays (such as character arrays
>or object pointer arrays) as in Numeric planned for numarray? (Perhaps
>
Check out chararray for character arrays.  
Check out recarray for arrays of fixed length structs.  
To make your own non-numeric arrays, subclass NDArray.

>
>even supported? My version might not be themost recent...)
>
>And what about subclasses of numeric types?
>
>E.g:
>
># numarray
>
>>>>class foo(int): pass
>>>>a = array(map(foo, xrange(10)))
>>>>
>[...]
>TypeError: Expecting a python numeric type, got a foo
>
># Numeric
>
>>>>class foo(int): pass
>>>>a = array(map(foo, xrange(10)))
>>>>tupe(a[0])
>>>>
><type 'int'>
>
>Neither behaviour seems very helpful -- I guess numarray's is
>cleaner... (Although in this case I think an object array could have
>been nice...)
>
Object arrays fall into the *eventually* category:  planned but not 
imminent.

>
>
>--
>Magnus Lie Hetland                                  The Anygui Project
>http://hetland.org                                  http://anygui.org
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
Todd

-- 
Todd Miller 			jmiller at stsci.edu
STSCI / SSG			(410) 338 4576


From magnus at hetland.org  Thu Apr 25 07:54:02 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 25 07:54:02 2002
Subject: [Numpy-discussion] Non-numeric arrays?
In-Reply-To: <20020425164228.C6821@idi.ntnu.no>; from magnus@hetland.org on Thu, Apr 25, 2002 at 04:42:28PM +0200
References: <20020425164228.C6821@idi.ntnu.no>
Message-ID: <20020425165304.D6821@idi.ntnu.no>

Magnus Lie Hetland <magnus at hetland.org>:
[snip]

Just a quick explanation for why I'm interested in this...

I've got a two-dimensional array of ints (or bytes, actually), that I
would like to convert to a delimited string (e.g. comma-separated).

This works in Numeric:

>>> from string import letters
>>> alphabet = array(letters)
>>> data = arange(24) # E.g...
>>> data.shape = 6, 4
>>> fields = sum(take(alphabet, data), 1)
>>> ','.join(fields)
'abcd,efgh,ijkl,mnop,qrst,uvwx'

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From perry at stsci.edu  Thu Apr 25 07:57:05 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 25 07:57:05 2002
Subject: [Numpy-discussion] Non-numeric arrays?
In-Reply-To: <20020425164228.C6821@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFGEMMDNAA.perry@stsci.edu>

[I see Todd has already answered this, the following might add
a little more detail]

> -----Original Message-----
> From: numpy-discussion-admin at lists.sourceforge.net
> [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Magnus
> Lie Hetland
> Sent: Thursday, April 25, 2002 10:42 AM
> To: Numpy-discussion
> Subject: [Numpy-discussion] Non-numeric arrays?
> 
> 
> I can't find this in the docs (although I've heard it's mentioned
> there)... Is support for non-numeric arrays (such as character arrays
> or object pointer arrays) as in Numeric planned for numarray? (Perhaps
> even supported? My version might not be themost recent...)
> 
Yes, in fact there is a character array class included with
numarray (but not documented, I believe. For the moment,
you'll have to deal with the source. We developed it for use
with our I/O library but it seemed to be of general enough
use to include with numarray.

We also plan to support arrays of Python objects. There are
various ways that this could be done and we ought to discuss
how it should be done (perhaps multiple ways). But the 
underlying machinery certainly will support it.

> And what about subclasses of numeric types?
> 
> E.g:
> 
> # numarray
> >>> class foo(int): pass
> >>> a = array(map(foo, xrange(10)))
> [...]
> TypeError: Expecting a python numeric type, got a foo
> 
> # Numeric
> >>> class foo(int): pass
> >>> a = array(map(foo, xrange(10)))
> >>> tupe(a[0])
> <type 'int'>
> 
> Neither behaviour seems very helpful -- I guess numarray's is
> cleaner... (Although in this case I think an object array could have
> been nice...)
> 
We haven't had much time to think about how we deal with
numeric subclasses. Certainly one would not use these
for efficiency, I can't see any simple way of making such
things go fast. But it may be possible to have such things 
work with numarray ufuncs and other numeric operations
in some automatic way. I'd have to think about that. It's
not high on the priority list at the moment. (Speaking of
which I may post in a few days).

Thanks, Perry
> 


From hinsen at cnrs-orleans.fr  Thu Apr 25 08:35:06 2002
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Thu Apr 25 08:35:06 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020425162734.B6821@idi.ntnu.no>
References: <20020417163133.F7565@idi.ntnu.no>
	<Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>
	<20020425162734.B6821@idi.ntnu.no>
Message-ID: <m38z7beqwm.fsf@chinon.cnrs-orleans.fr>

Magnus Lie Hetland <magnus at hetland.org> writes:

> (Any idea where the "kj" prefix comes from, by the way?)

I asked Aaron Watter about this. The answer: k and j are the initials
of his children.

Konrad.


From jasper at peak.org  Mon Apr 29 03:14:04 2002
From: jasper at peak.org (Jasper Phillips)
Date: Mon Apr 29 03:14:04 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
Message-ID: <200204291013.DAA32745@spock.peak.org>

I'm helping my wife with programming for her economics thesis, which needs
to calculate a "Multiple Linear Regression" on her data.

Does anyone know of any (preferably though not necesarrily free) software
that can do this? I'm working in Python, but not limited to it as I
can relatively freely access other languages.

I'm still looking for a library written in Python, but haven't had any luck.

My second thought was Matlab, but looking over the Matlab website, I couldn't
find anything like this by a name I recognize. It looks like I might be able
to construct something out of a combination of Sparse Matrices and Linear
Regesstion, or perhaps the stuff for overdetermined Linear Equations?

Another option may be LAPACK routines, but I'm not familiar with those.

Does anyone here have any experience with this kind of stuff? Is there a
better place to ask? I'm about ready to take a shot at writing something
myself, but I'd really rather avoid this if it's been done before.

-Jasper


From hinsen at cnrs-orleans.fr  Mon Apr 29 05:40:03 2002
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Mon Apr 29 05:40:03 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
In-Reply-To: <200204291013.DAA32745@spock.peak.org>
References: <200204291013.DAA32745@spock.peak.org>
Message-ID: <m3elgy8yxg.fsf@chinon.cnrs-orleans.fr>

Jasper Phillips <jasper at peak.org> writes:

> I'm still looking for a library written in Python, but haven't had any luck.

Numerical Python has all the basic stuff, but you need to read in and
arrange the data yourself. All linear regression problems ultimately
become least-squares problems for a system of linear equations, which
can be solved using LinearAlgebra.linear_least_squares.

Konrad.


From Alexandre.Fayolle at logilab.fr  Mon Apr 29 06:20:03 2002
From: Alexandre.Fayolle at logilab.fr (Alexandre)
Date: Mon Apr 29 06:20:03 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
In-Reply-To: <200204291013.DAA32745@spock.peak.org>
References: <200204291013.DAA32745@spock.peak.org>
Message-ID: <20020429131937.GE30347@orion.logilab.fr>

On Mon, Apr 29, 2002 at 03:13:44AM -0700, Jasper Phillips wrote:
> I'm helping my wife with programming for her economics thesis, which needs
> to calculate a "Multiple Linear Regression" on her data.
> 
> Does anyone know of any (preferably though not necesarrily free) software
> that can do this? I'm working in Python, but not limited to it as I
> can relatively freely access other languages.
> 
> I'm still looking for a library written in Python, but haven't had any luck.
> 

I'm helping my wife with her History PhD, and have to deal with similar
stuff. I found R to be a very useful environment for statistical
computations. R is a free software clone of S-plus, which is to statistics
what Matlab is to linear algebra and automation. 

Pros: 
 - programming environment, with a high level programming language
 - extensive statistical and linalg library (using C and FORTRAN code)
 - lots of third party code available, covering a very wide range of
   situations
 - Python bindings available if you don't want to learn the Scheme-like
   language
 - Tons of documentation available
 - Excellent support through the mailing lists
 - GPL'd
 - Tons of way to import data (ranging from CSV files to ODBC queries)
 - 2 printed books available, at Springer Verlag
 - postscript, png, wmf, X outputs, with precise control of the layout
   of the graphs and figures available for a nice colourful thesis

Cons:
 - the language can be a bit weird at times (it took me some time to get
   used to '.' being used instead of '_' and vice versa in the scoping
   and variable naming), but you can use Python to script R, thanks to
   RPython
 - it's quite a big piece of code, with a rather steep learning curve
   and you need time to get inside it
 - the documentation is aimed at professional statisticians. I had to
   dig back in my statistics courses and to buy a couple of books on
   that topic for the software to become really useful. Asking newbie
   statistician questions on the r-help mailing list is off-topic
 - the springer verlag books are very expensive (Modern Applied
   Statistics with S-plus costs something like 70 euros), but they are
   great

So you have a powerful tool available at your fingertips, designed to do
precisely what you need. I think it's worth taking the time to look at
it carefully. The more I get to understand the topic, the more ideas I
get for new ways of exploring the data of my wife's PhD. 

 
Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).


From Alexandre.Fayolle at logilab.fr  Mon Apr 29 06:28:08 2002
From: Alexandre.Fayolle at logilab.fr (Alexandre)
Date: Mon Apr 29 06:28:08 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
In-Reply-To: <20020429131937.GE30347@orion.logilab.fr>
References: <200204291013.DAA32745@spock.peak.org> <20020429131937.GE30347@orion.logilab.fr>
Message-ID: <20020429132741.GF30347@orion.logilab.fr>

On Mon, Apr 29, 2002 at 03:19:37PM +0200, Alexandre wrote:

> I'm helping my wife with her History PhD, and have to deal with similar
> stuff. I found R to be a very useful environment for statistical
> computations. R is a free software clone of S-plus, which is to statistics
> what Matlab is to linear algebra and automation. 


Woops, I forgot to add a couple of URLs:

The R project website 
 http://www.r-project.org/
The Comprehensive R Archive Network (CRAN)
 http://cran.r-project.org/
Using R from Python
 http://rpy.sourceforge.net/
Using R from Python and Python from R (coding R extensions in Python)
 http://www.omegahat.org/RSPython/

Cheers,

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).


From cavallo at kip.uni-heidelberg.de  Mon Apr 29 10:10:20 2002
From: cavallo at kip.uni-heidelberg.de (cavallo at kip.uni-heidelberg.de)
Date: Mon Apr 29 10:10:20 2002
Subject: [Numpy-discussion] kdfio, 1.1.1
Message-ID: <Pine.LNX.4.33.0204291901350.12869-100000@modigliani.darktech.org>

hy,
here is the url last version of kdfio a khoros/cantata kdf file importer:
nothing special, but it seems  working now, at least for me;-)
You can find it at:

http://kdfio.sourceforge.net

This is my (very) small contribution to the numerical python:
inside i plugged a way to modularize the code (and writing some skeleton
semi-automatically) that could speed up a litte bit writing new code.
Before to give a full announcement on sourceforge i will wait a little
bit, just to see if there are no bugs around.
Fell free to use/change/make what you want,

thanks to all,
antonio cavallo

ps. khoros is available at http://www.khoral.com and it is not a free
program: there is just a free student version.


From jmiller at stsci.edu  Mon Apr 29 10:14:07 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Mon Apr 29 10:14:07 2002
Subject: [Numpy-discussion] ANN: Numarray-0.3.3
Message-ID: <3CCD7F49.5030809@stsci.edu>

Numarray 0.3.3
---------------------------------
Numarray is an array processing package designed to efficiently 
manipulate large multi-dimensional arrays.  Numarray is modelled after 
Numeric and features c-code generated from python template scripts, the 
capacity to operate directly on arrays in files, and improved type 
promotions.

Numarray-0.3.3 features improved support for arrays of complex numbers, 
re-implementing complex types using generated code.  In  addition to 
being faster, the new complex ufuncs are better integrated with the 
numarray type system, so operations between numarrays and complex 
scalars now work properly.  This release also fixes a problem 
experienced by RedHat Linux users installing numarray from source.

WHERE
-----------
Numarray-0.3.3 windows executable installers and source code tar ball is 
here:

http://sourceforge.net/project/showfiles.php?group_id=1369

Numarray is hosted by Source Forge in the same project which hosts Numeric:

http://sourceforge.net/projects/numpy/

The web page for Numarray information is at:

http://stsdas.stsci.edu/numarray/index.html

Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at
the Source Forge project for NumPy at:

http://sourceforge.net/tracker/?group_id=1369

REQUIREMENTS
--------------------------

numarray-0.3.3 requires Python 2.0 or greater.


AUTHORS, LICENSE
------------------------------

Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC
Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science
Institute.  Thanks go to Jochen Kupper of the University of North
Carolina for his work on Numarray and for porting the Numarray manual
to TeX format.

Numarray is made available under a BSD-style License.  See
LICENSE.txt in the source distribution for details.

-- 
Todd Miller             jmiller at stsci.edu


From haase at msg.ucsf.edu  Mon Apr 29 11:18:15 2002
From: haase at msg.ucsf.edu (Sebastian Haase)
Date: Mon Apr 29 11:18:15 2002
Subject: [Numpy-discussion] unsigned short  support in NumPy
Message-ID: <auto-000000164030@msg.ucsf.edu>

Hi all,
I'm _very_ new to NumPy.
I was interested in using it for our project, where we acquire data from a 
CCD camera.

The Problem:  Each pixel in the image is a 16 bit gray value.
             What I read in the documentation - there is only 8 bit (unsigned 
integer) support in numpy (or should I say numericarray)

Are there plans to add a "unsigned short" (16 bit) support .
How much effort would that be.

Regards,
Sebastian Haase


-- 
                                _\\|//_
                               (' O-O ')
------------------------------ooO-(_)-Ooo--------------------------------------
Sebastian Haase
University of California, San Francisco
(415)502-4316


From rick at bioinformatics.org  Mon Apr 29 11:35:26 2002
From: rick at bioinformatics.org (Rick Ree)
Date: Mon Apr 29 11:35:26 2002
Subject: [Numpy-discussion] testing Numeric.array([0])
Message-ID: <1020105268.12239.27.camel@loco.ucdavis.edu>

Should Numeric.array([0]) test false?  This seems counterintuitive, and
is not the case for the regular python array module.

This recently caused a subtle bug for me when I wanted to find the
indices of an array that met a condition.  If only the first element met
the condition, the result was array([0]) -- a non-empty result that
evaluated false.

If this is the intended behavior, can someone tell me the reason?

thanks,
Rick


From perry at stsci.edu  Mon Apr 29 13:31:14 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Mon Apr 29 13:31:14 2002
Subject: [Numpy-discussion] unsigned short  support in NumPy
In-Reply-To: <auto-000000164030@msg.ucsf.edu>
Message-ID: <JFEGLNDJEDNOMPPHDEJFKENODNAA.perry@stsci.edu>


> Sebastian Haase writes:

> 
> Hi all,
> I'm _very_ new to NumPy.
> I was interested in using it for our project, where we acquire 
> data from a 
> CCD camera.
> 
> The Problem:  Each pixel in the image is a 16 bit gray value.
>              What I read in the documentation - there is only 8 
> bit (unsigned 
> integer) support in numpy (or should I say numericarray)
> 
> Are there plans to add a "unsigned short" (16 bit) support .
> How much effort would that be.
> 
There is a reimplemenation of Numeric that we are doing that does
support unsigned ints (Unsigned Int8, Unsigned Int16 for now).
The project is not mature, but a lot of basic cabability exists
now. You'll have to look it over to judge if it is usable 
for you now. The new version is called numarray
( http://stsdas.stsci.edu/numarray )

(btw, we acquire data from CCD cameras as well ;-) 

Perry


From tchur at optushome.com.au  Mon Apr 29 13:46:39 2002
From: tchur at optushome.com.au (Tim Churches)
Date: Mon Apr 29 13:46:39 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
References: <200204291013.DAA32745@spock.peak.org>
Message-ID: <3CCDBB6C.8A983A5C@optushome.com.au>

Jasper Phillips wrote:
> 
> I'm helping my wife with programming for her economics thesis, which needs
> to calculate a "Multiple Linear Regression" on her data.
> 
> Does anyone know of any (preferably though not necesarrily free) software
> that can do this? I'm working in Python, but not limited to it as I
> can relatively freely access other languages.

Jasper,

Use R (a free implementation of S). See http://www.r-project.org

If you are managing your data in Python and NumPy, you can "embed" R in
Python
and transparently send data to it using Walter Moreira's wonderful RPy
module - 
see http://rpy.sf.net

Tim C


From clee at spiralis.merseine.nu  Mon Apr  1 09:06:59 2002
From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu)
Date: Mon Apr  1 09:06:59 2002
Subject: [Numpy-discussion] slice question and bug
Message-ID: <20020401165852.44D3E79B@spiralis.merseine.nu>

Hello,
I'm trying to track down a segv when I do the B[:] operation on an
array, "B", a that I've built in as a view on external data.  During
the process I ran into the following code (Numeric-21.0):

/* {%c++%} */
     extern int PyArray_Free(PyObject *op, char *ptr) {
	 PyArrayObject *ap = (PyArrayObject *)op;
	 int i, n;

	 if (ap->nd > 2) return -1;
	 if (ap->nd == 3) {
	     n = ap->dimensions[0];
	     for (i=0; i<n; i++) {
		 free(((char **)ptr)[i]);
	     }
	 }
	 if (ap->nd >= 2) {
	     free(ptr);
	 }
	 Py_DECREF(ap);
	 return 0;
     }
/* {%c++%} */

The multiple, incompatible tests of ap->nd are the problem.  

-chris


From clee at spiralis.merseine.nu  Mon Apr  1 10:59:02 2002
From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu)
Date: Mon Apr  1 10:59:02 2002
Subject: [Numpy-discussion] slice question and bug
In-Reply-To: <20020401165852.44D3E79B@spiralis.merseine.nu>
References: <20020401165852.44D3E79B@spiralis.merseine.nu>
Message-ID: <15528.44386.160013.936132@spiralis.merseine.nu>


clee at spiralis.merseine.nu writes:
 > 
 > Hello,
 > I'm trying to track down a segv when I do the B[:] operation on an
 > array, "B", a that I've built in as a view on external data.  During...
 > [snip]

To clarify my own somewhat non-sensical post: When I started composing
my message, I was trying to figure out a bug in my own code that
caused a crash while doing slice_array.  I've since fixed that bug.
However, in the process of figuring out what I was doing wrong I
was browsing the Numeric source code.  While examining
PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the
number of dimensions is greater than 2, yet it has code that tests for
when the number of dimensions equals 3.

So utimately, my post is just an alert, that I think there might be
some code that needs to be cleaned up. 

Thanks,
 lacking-caffeine-ly yours
 -chris 


From nwagner at mecha.uni-stuttgart.de  Wed Apr  3 11:48:47 2002
From: nwagner at mecha.uni-stuttgart.de (Nils Wagner)
Date: Wed Apr  3 11:48:47 2002
Subject: [Numpy-discussion] Factorization of complex symmetric matrices
Message-ID: <3CAABF60.12D609C0@mecha.uni-stuttgart.de>

Hi,

I am looking for a suitable factorization of complex symmetric matrices.
Where can I find a proper routine ?

Nils


From ray_drew at yahoo.co.uk  Thu Apr  4 02:27:09 2002
From: ray_drew at yahoo.co.uk (Ray Drew)
Date: Thu Apr  4 02:27:09 2002
Subject: [Numpy-discussion] RandomArray difference between Python2.1 and 2.2?
References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu>
Message-ID: <000b01c1dbc3$65fe6100$6014000a@RDREWXP>

Hi,

Can anyone explain the following?

Python 2.1.1, Numpy version='20.2.0'

Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from RandomArray import *
>>> normal(3., 1., (5,))
array([ 2.19091588,  2.44682837,  2.51790264,  4.26374364,  4.56880629])


Python 2.2, Numpy version='20.3'

Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from RandomArray import *
>>> normal(3., 1., (5,))
array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304])

Why am I getting negative values with Python 2.2? This happens consistently.
Any help would be appreciated.

Thanks,

Ray


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From pearu at cens.ioc.ee  Thu Apr  4 18:51:36 2002
From: pearu at cens.ioc.ee (Pearu Peterson)
Date: Thu Apr  4 18:51:36 2002
Subject: [Numpy-discussion] RandomArray difference between Python2.1 and
 2.2?
In-Reply-To: <000b01c1dbc3$65fe6100$6014000a@RDREWXP>
Message-ID: <Pine.LNX.4.21.0204042359090.8013-100000@cens.ioc.ee>

On Thu, 4 Apr 2002, Ray Drew wrote:

> Python 2.2, Numpy version='20.3'
> 
> Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32
> Type "copyright", "credits" or "license" for more information.
> IDLE 0.8 -- press F1 for help
> >>> from RandomArray import *
> >>> normal(3., 1., (5,))
> array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304])
> 
> Why am I getting negative values with Python 2.2? This happens consistently.
> Any help would be appreciated.

This is a bug in Numpy 20.3 and should be fixed in Numpy 21.0.

Pearu


From kelson at fedka.ociw.edu  Fri Apr  5 08:23:49 2002
From: kelson at fedka.ociw.edu (Daniel D. Kelson)
Date: Fri Apr  5 08:23:49 2002
Subject: [Numpy-discussion] Error in MLab.py
Message-ID: <200204040140.g341e1e04422@fedka.ociw.edu>

Howdy:
  Shoudln't line 296 in MLab.py of Numeric 21.0, which currently reads:
    val = squeeze(dot(transpose(m)*conjugate(y)) / fact)
  read:
    val = squeeze(dot(transpose(m),conjugate(y)) / fact)
 
Thanks,
D.Kelson
Carnegie Observatories
http://www.ociw.edu/~kelson


From DavidA at ActiveState.com  Fri Apr  5 13:47:03 2002
From: DavidA at ActiveState.com (David Ascher)
Date: Fri Apr  5 13:47:03 2002
Subject: [Numpy-discussion] Re: [Python-Dev] Array Enhancements
References: <20020405203029.19286.qmail@web12903.mail.yahoo.com> <200204052121.g35LLut20125@pcp742651pcs.reston01.va.comcast.net>
Message-ID: <3CAE1913.ECB27329@activestate.com>

Guido van Rossum wrote:

> >  I would propose the following for multi-dimensional arrays:
> >
> >    a = array.array('d', 20000, 20000)
> >
> > or:
> >
> >    a = array.xarray('d', 20000, 20000)
> 
> I just realized that multi-dimensional __getitem__ shouldn't be a big
> deal.  The question is, given the above declaration, what a[0] should
> return: the same as a[0, 0] or a copy of a[0, 0:20000] or a reference
> to a[0, 0:20000].

Or a ValueError?  In the face of ambiguity, refuse the temptation to
guess.

IIRC, this issue caused lots of problems in the numpy world. cc'ing Paul
in case he wants to jump in to fill in my rusty memory.

Why does submitting a patch to arraymodule seem an easier path than
modifying numarray or numpy to support what's needed?  I believe that
the goals of numarray aren't that different from what Scott is trying to
do (memory management APIs, etc.).

I'd like to see fewer multi-dimensional array objects, not more...

--david ascher


From jochen at unc.edu  Fri Apr  5 20:56:09 2002
From: jochen at unc.edu (Jochen =?iso-8859-1?q?K=FCpper?=)
Date: Fri Apr  5 20:56:09 2002
Subject: [Numpy-discussion] numerical integration
Message-ID: <ly6635s9xy.fsf@bock.chem.unc.edu>

The following message is a courtesy copy of an article
that has been posted to comp.lang.python.announce as well.

I have made a numerical intergation package available at 
,----
| http://python.jochen-kuepper.de/integrate
`----

This is a copy of the integrate module of scipy by Travis Oliphant
plus some small changes and rearrangements to make it work standalone
(well, it need Numeric).  All credits go to the scipy folks,
esp. Travis, all errors should be blamed on me.

Greetings,
Jochen

PS: In the long run this module will be phased out in favor of scipy,
    but for now it might be useful for someone...
-- 
Einigkeit und Recht und Freiheit                http://www.Jochen-Kuepper.de
    Libert?, ?galit?, Fraternit?                GnuPG key: 44BCCD8E
        Sex, drugs and rock-n-roll


From andrewm at object-craft.com.au  Sun Apr  7 23:32:07 2002
From: andrewm at object-craft.com.au (Andrew McNamara)
Date: Sun Apr  7 23:32:07 2002
Subject: [Numpy-discussion] Puzzling numpy results?
Message-ID: <20020408063157.1659D38F5B@coffee.object-craft.com.au>

The behavior I'm seeing with zero length Numeric arrays is not what I
would have expected:

    >>> from Numeric import *
    >>> array([5]) != array([])
    zeros((0,), 'l')
    >>> array([]) == array([])
    zeros((0,), 'l')
    >>> allclose(array([5]), array([]))
    1

This is with Numeric-20.3 (and Numeric-20.2.1) - is this behavior correct,
or have I stumbled across a bug?

If both sides of the comparison are arrays with a length greater than
zero, the comparisons work as expected:

    >>> array([5]) != array([6])
    array([1])
    >>> array([5, 5]) != array([6])
    array([1, 1])
    >>> array([5]) != array([5])
    array([0])

The problem came up when I was writing unittests for some Numpy code:
under some circumstances, the code under test is expected to return a
zero length array: I was somewhat surprised when I couldn't make the
test fail! 8-)

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/


From tchur at optushome.com.au  Mon Apr  8 13:34:16 2002
From: tchur at optushome.com.au (Tim Churches)
Date: Mon Apr  8 13:34:16 2002
Subject: [Numpy-discussion] Puzzling numpy results?
References: <20020408063157.1659D38F5B@coffee.object-craft.com.au>
Message-ID: <3CB208CE.2270C89@optushome.com.au>

Andrew McNamara wrote:
> 
> The behavior I'm seeing with zero length Numeric arrays is not what I
> would have expected:
> 
>     >>> from Numeric import *
>     >>> array([5]) != array([])
>     zeros((0,), 'l')
>     >>> array([]) == array([])
>     zeros((0,), 'l')
>     >>> allclose(array([5]), array([]))
>     1

The Numpy docs point out that == and != are implemented via the logical
ufuncs, and that:

 "The ``logical'' ufuncs also perform their operations on arrays in
elementwise fashion, just like the ``mathematical'' ones."

I think this explains the results you are seeing: if you do an
element-wise comparison of a length-one array with a zero-length array,
the Numpy
recycling rule means that you should always get a zero-length result.
Note that zeros((0,),'l') is not zero, it is zero zeros. So although the
results are surprising (at least to me, and you), I think the observed
results are logically correct, although surprising.

But, if that is the case, why does this hold (which I suspect reflects
what you originally expected)?:

>>> from Numeric import *
>>> array([5,6]) != array([])
1
>>> array([5,6]) == array([])
0

Tim C


From xscottg at yahoo.com  Thu Apr 11 04:32:03 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr 11 04:32:03 2002
Subject: [Numpy-discussion] Introduction
Message-ID: <20020411113152.98373.qmail@web12906.mail.yahoo.com>

Hello All.

I'm interested in this project, and am curious to what level you are
willing to accept outside contribution.  I just tried to subscribe to
the developers list, but I didn't realize that required admin approval.
 Hopefully it doesn't look like I was shaking the door without knocking
first.

Is this list active?  Is this the correct place to talk about Numarray?


A little about me:

My name is Scott Gilbert, and I work as a software developer for a
company called Rincon Research in Tucson Arizona.  We do a lot digital
signal processing/analysis among other things. 

In the last year or so, we've started to use Python in various
capacities, and we're hoping to use it for more things.

We need a good array module for various things.  Some are similar to
what it looks like Numarray is targeted at (fft, convolutions, etc...),
and others are pretty different (providing buffers for reading data
from specialized hardware etc...)

About a week ago, I noticed that Guido over in Python developer land
was willing to accept patches to the standard array module.  As such, I
thought I would take that opportunity to try and wedge some desirements
and requirements I have into that baseline.  Bummer for me, but they
weren't exactly exited about bloating out arraymodule.c to meet my
needs, and in retrospect that does make good sense.  A number of people
suggested that this might be a better place to try and get what I need.

So here I am, poking around and wondering if I can play in your
sandbox.


If you're willing to let me contribute, my specific itches that I need
to scratch are below.  Otherwise - bummer, and I hope you all catch
crabs...  :-)


-----------------------------------

It's taken me a couple of days to understand what's going on in the
source.  I've read through the design docs, and the PEP, but it wasn't
until I tried to re-implement it that it really clicked.  My
re-implementation of the array portion of what you're doing is
attached.  There are still some holes to fill in, but it's fairly
complete and supports a whole bunch of things which yours does not
(Some of which you might even find useful: Pickling, Bit type).  I'm
pretty proud of it for only 400 lines of Python (Most of which is the
bazillion type declarations).  It's probably riddled with bugs as it's
less than a day old...

After initially thinking that you guys were getting too clever, I've
come to realize it's a pretty good design overall.  Still I have some
changes I would like to make if you'll let me.  (Both to the design and
the implementation)


-------------------------

Following your design for the Array stuff, I've been able to implement
a pretty usable array class that supports the bazillion array types I
need (Bit, Complex Integer, etc...).  This gets me past my core
requirements without polluting your world, but unfortunately my new
XArray type doesn't play so well with your UFuncs.  I think my users
will definitely want to use your UFuncs when the time comes, so I want
to remedy this situation.

The first change I would like to make is to rework your code that
verifies that an object is a "usable" array.  I think NumArray should
only check for the interface required, not the actual type hierarchy. 
By this I mean that the minimum required to be a supported array type
is that it support the correct attributes, not that it actually inherit
from NDArray:

   (quoting from your paper) something like:

       _data
       _shape
       _strides
       _byteoffset
       _aligned
       _contiguous
       _type
       _byteswap

Most of these are just integer fields, or tuples of integers.  Ignoring
_type for the moment, it appears that the interface required to be a
NumArray is much less strict than actually requiring it to derive from
NumArray.  If you allow me to change a few functions (inputarray() in
numarray.py is one small example), I could use my independant XArray
class almost as is, and moreover I can implement new array objects
(possibly as extension types) for crazy things like working with page
aligned memory, memory mapping etc...


Well, that's almost enough.  The _type field poses a small problem of
sorts.  It looks like you don't require a _type to be derived from
NumericType, and this is a good thing since it allows me (and others)
to implement NumArray compatible arrays without actually requiring
NumArray to be present.

However, it would be nice if you declared a more comprehensive list of
typenames - even if they aren't all implemented in NumArray proper. 
Who knows, maybe the SciPy guys have a use for complex integers or bit
arrays.  If you make a reasonable canonical list, our data could be
passed back and forth even if NumArray doesn't know what to do with it.

See my attached module for the types of things I'm thinking of.  I'm
not so concerned about the "Native Types" that are in there, but I
think committing a list of named standard types.  (I suspect there are
others that are interested in standard C types even if the size changes
between machines...)

If you were to specify a minimal interface like this in the short term,
I could begin propagating my array module to my users.  I could get my
work done now, knowing that I'll be compatible with NumArray proper
once it matures.  I'd be willing to participate in making these changes
if necessary.

Looking at the big picture, I think it's desirable that there really
only be one official standard for ND arrays in the Python world.  That
way, the various independent groups can all share their independent
work.  You guys are the heir-apparent, so to speak, from the Python
guys point of view.

I don't know if you're trying to get all of NumArray into the Python
distribution or not, but I suspect a good interim step would be to have
a PEP that specifies what it means to be a NumArray or NDArray in
minimal terms.  Perhaps supplying an Array only module in Python that
implements this interface.  Again, I'd be willing to help with all of
this.


-------------------------

Ok, other suggestions...

Here is the list of things that your design document indicates are
required to be a NumArray:

       _data
       _shape
       _strides
       _byteoffset
       _aligned
       _contiguous
       _type
       _byteswap

I believe that one could calculate the values for _aligned and
_contiguous from the other fields.  So they shouldn't really be part of
the interface required.  I suspect it is useful for the C
implementation of UFuncs to have this information in the NDINfo struct
though, so while I would drop them from attribute interface, I would
delegate the task of calculating these values to getNDInfo() and/or
getNumInfo().

I also notice that you chose _byteswap to indicate byteswapping is
needed.  I think a better choice would be to specify the endian-ness of
the data (with an _endian attr), and have getNDInfo() and getNumInfo()
calculte the _byteswap value for the NDInfo struct.

In my implementation, I came up with a slightly different list:

            self._endian
            self._offset
            self._shape
            self._stride
            self._itemtype
            self._itemsize
            self._itemformat
            self._buffer

The only minimal differences are that _itemsize allows me to work with
arrays of bytes without having any clue what the underlying type is (in
some cases, _itemtype is "Unknown".)  Secondly, I implemented a
"Struct" _itemtype, and _itemformat is useful for for this case.  (It's
the same format string that the struct module in Python uses.)

Also, I specified 0 for _itemsize when the actual items aren't byte
addressable.  In my module, this only occurred with the Bit type.  I
figured specifying 0 like this could keep a UFunc that isn't Bit aware
from stepping on memory that it isn't allowed to.

-------------------------

Next thought:  Memory Mapping

I really like the idea of having Python objects that map huge files a
piece at time without using all of available memory.  I've seen this in
NumArray's charter as part of the reason for breaking away from
Numeric, and I'm curious how you intend to address it.

Right now, the only requirement for _data seems to be that it implement
the PyBufferProcs.  For memory mapping something else is needed...

I haven't implemented this, so take it as just my rambling thoughts:

With the addition of 3 new, optional, attributes to the NumArray object
interface, I think this could be efficiently accomplished:

     _mapproc
     _mapmin
     _mapmax

If _mapproc is present and not None, then it points to a function who's
responsibility it is to set _mapmin and _mapmax appropriately. 
_mapproc takes one argument which is the desired byte offset into the
virtual array.  This is probably easier to describe with code:

     def _mapproc(self, offset):
         unmap_the_old_range()
         mmap_a_new_range_that_includes_byteoffset()
         self._mapmin = minimum_of_new_range()
         self._mapmax = maximum_of_new_range()

In this way, when the delta between _mapmin and _mapmax is large
enough, the UFuncs could act over a large contiguous portion of the
_data array at a time before another remapping is necessary.  If the
byteoffset that a UFunc needs to work with is outside of _mapmin and
_mapmax, it must call _mapproc to remedy the situation.

This puts a lot of work into UFuncs that choose to support this.  I
suppose that is tough to avoid though.

Also, there are threading issues to think about here.  I don't know if
UFuncs are going to release the Global Interpreter Lock, but if they do
it's possible that multiple threads could have the same PyObject and
try to _mapproc different offsets at different times.

It is possible to implement a mutex for the NumArray without requiring
anything special from the PyObject that implements it...


-----------------------------


Ok.  That's probably way too much content for an Introductory email.  I
do have more thoughts on this stuff though.  They'll just have to wait
for another time.

Nice to meet you all,
    -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: XArray.py
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20020411/a87228d8/attachment-0001.ksh>

From perry at stsci.edu  Thu Apr 11 09:02:05 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 11 09:02:05 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFMEIEDNAA.perry@stsci.edu>

Hi Scott,

I've printed out your message and will try to read and understand
it today. It may be a couple days before we can respond, so 
don't take a lack of an immediate response as disinterest.

Thanks, Perry


From jmiller at stsci.edu  Thu Apr 11 14:36:03 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 11 14:36:03 2002
Subject: [Numpy-discussion] slice question and bug
References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu>
Message-ID: <3CB60188.1010203@stsci.edu>

clee at spiralis.merseine.nu wrote:

>
>clee at spiralis.merseine.nu writes:
> > 
> > Hello,
> > I'm trying to track down a segv when I do the B[:] operation on an
> > array, "B", a that I've built in as a view on external data.  During...
> > [snip]
>
>To clarify my own somewhat non-sensical post: When I started composing
>my message, I was trying to figure out a bug in my own code that
>caused a crash while doing slice_array.  I've since fixed that bug.
>However, in the process of figuring out what I was doing wrong I
>was browsing the Numeric source code.  While examining
>PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the
>number of dimensions is greater than 2, yet it has code that tests for
>when the number of dimensions equals 3.
>
>So utimately, my post is just an alert, that I think there might be
>some code that needs to be cleaned up. 
>
>Thanks,
> lacking-caffeine-ly yours
> -chris 
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
Looking at the code to PyArray_Free,  I agree with Chris.  Called to 
free a 2D
array, I think that PyArray_Free leaks all of the row storage because 
ap->nd == 2, not 3:

* {%c++%} */
     extern int PyArray_Free(PyObject *op, char *ptr) {
	 PyArrayObject *ap = (PyArrayObject *)op;
	 int i, n;

	 if (ap->nd > 2) return -1;
	 if (ap->nd == 3) {
	     n = ap->dimensions[0];
	     for (i=0; i<n; i++) {
		 free(((char **)ptr)[i]);
	     }
	 }
	 if (ap->nd >= 2) {
	     free(ptr);
	 }
	 Py_DECREF(ap);
	 return 0;
     }
/* {%c++%} */


Other opinions?

Todd

-- 
Todd Miller 			jmiller at stsci.edu
STSCI / SSG			(410) 338 4576


From perry at stsci.edu  Thu Apr 11 14:57:14 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 11 14:57:14 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFGEIHDNAA.perry@stsci.edu>

> [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Scott
> Gilbert
> Subject: [Numpy-discussion] Introduction
> 
> 
> Hello All.
> 
> I'm interested in this project, and am curious to what level you are
> willing to accept outside contribution.  I just tried to subscribe to
> the developers list, but I didn't realize that required admin approval.
>  Hopefully it doesn't look like I was shaking the door without knocking
> first.
> 
> Is this list active?  Is this the correct place to talk about Numarray?
 
Sure.
 
> 
> Following your design for the Array stuff, I've been able to implement
> a pretty usable array class that supports the bazillion array types I
> need (Bit, Complex Integer, etc...).  This gets me past my core
> requirements without polluting your world, but unfortunately my new
> XArray type doesn't play so well with your UFuncs.  I think my users
> will definitely want to use your UFuncs when the time comes, so I want
> to remedy this situation.
> 
> The first change I would like to make is to rework your code that
> verifies that an object is a "usable" array.  I think NumArray should
> only check for the interface required, not the actual type hierarchy. 
> By this I mean that the minimum required to be a supported array type
> is that it support the correct attributes, not that it actually inherit
> from NDArray:
> 
>    (quoting from your paper) something like:
> 
>        _data
>        _shape
>        _strides
>        _byteoffset
>        _aligned
>        _contiguous
>        _type
>        _byteswap
> 
> Most of these are just integer fields, or tuples of integers.  Ignoring
> _type for the moment, it appears that the interface required to be a
> NumArray is much less strict than actually requiring it to derive from
> NumArray.  If you allow me to change a few functions (inputarray() in
> numarray.py is one small example), I could use my independant XArray
> class almost as is, and moreover I can implement new array objects
> (possibly as extension types) for crazy things like working with page
> aligned memory, memory mapping etc...
> 
I guess we are not sure we understand what you mean by interface.
In particular, we don't understand why sharing the same object
attributes (the private ones you list above) is a benefit to the
code you are writing if you aren't also using the low level
implementation. The above attributes are private and nothing 
external to the Class should depend on or even know about them.
Could you elaborate on what you mean by interface and the relationship
between your arrays and numarrays?

> 
> Well, that's almost enough.  The _type field poses a small problem of
> sorts.  It looks like you don't require a _type to be derived from
> NumericType, and this is a good thing since it allows me (and others)
> to implement NumArray compatible arrays without actually requiring
> NumArray to be present.
>
What do you mean by NumArray compatible?
 
[some issues snipped since we need to understand the interface issue
first]

> I don't know if you're trying to get all of NumArray into the Python
> distribution or not, but I suspect a good interim step would be to have
> a PEP that specifies what it means to be a NumArray or NDArray in
> minimal terms.  Perhaps supplying an Array only module in Python that
> implements this interface.  Again, I'd be willing to help with all of
> this.
>
We are hoping to get numarray into the distribution [it won't be the
end of the world for us if it doesn't happen]. I'll warn you that the
PEP is out of date. We are likely to update it only after we feel
we are close to having the implementation ready for consideration 
for including into the standard distribution. I would refer to the
actual implementation and the design notes for the time being.
> 
> -------------------------
> 
> Ok, other suggestions...
> 
> Here is the list of things that your design document indicates are
> required to be a NumArray:
> 
>        _data
>        _shape
>        _strides
>        _byteoffset
>        _aligned
>        _contiguous
>        _type
>        _byteswap
> 
> I believe that one could calculate the values for _aligned and
> _contiguous from the other fields.  So they shouldn't really be part of
> the interface required.  I suspect it is useful for the C
> implementation of UFuncs to have this information in the NDINfo struct
> though, so while I would drop them from attribute interface, I would
> delegate the task of calculating these values to getNDInfo() and/or
> getNumInfo().
> 
> I also notice that you chose _byteswap to indicate byteswapping is
> needed.  I think a better choice would be to specify the endian-ness of
> the data (with an _endian attr), and have getNDInfo() and getNumInfo()
> calculte the _byteswap value for the NDInfo struct.
> 
> In my implementation, I came up with a slightly different list:
> 
>             self._endian
>             self._offset
>             self._shape
>             self._stride
>             self._itemtype
>             self._itemsize
>             self._itemformat
>             self._buffer
> 
Some of the name changes are worth considering (like replacing ._byteswap
with an endian indicator, though I find _endian completely opaque as to
what it would mean--1 means what? little or big?). (BTW, we already have
_itemsize). _contiguous and _aligned are things we have been considering
changing, but I would have to think about it carefully to determine if
they really are redundant.

> The only minimal differences are that _itemsize allows me to work with
> arrays of bytes without having any clue what the underlying type is (in
> some cases, _itemtype is "Unknown".)  Secondly, I implemented a
> "Struct" _itemtype, and _itemformat is useful for for this case.  (It's
> the same format string that the struct module in Python uses.)
> 
It looks like you are trying to deal with records with these "structs". 
We deal with records (efficiently) in a completely different way. Take
a look at the recarray module.

> Also, I specified 0 for _itemsize when the actual items aren't byte
> addressable.  In my module, this only occurred with the Bit type.  I
> figured specifying 0 like this could keep a UFunc that isn't Bit aware
> from stepping on memory that it isn't allowed to.
> 
Again, we aren't sure how this works with numarray.

> -------------------------
> 
> Next thought:  Memory Mapping
> 
> I really like the idea of having Python objects that map huge files a
> piece at time without using all of available memory.  I've seen this in
> NumArray's charter as part of the reason for breaking away from
> Numeric, and I'm curious how you intend to address it.
> 
> Right now, the only requirement for _data seems to be that it implement
> the PyBufferProcs.  For memory mapping something else is needed...
> 
> I haven't implemented this, so take it as just my rambling thoughts:
> 
> With the addition of 3 new, optional, attributes to the NumArray object
> interface, I think this could be efficiently accomplished:
> 
>      _mapproc
>      _mapmin
>      _mapmax
> 
> If _mapproc is present and not None, then it points to a function who's
> responsibility it is to set _mapmin and _mapmax appropriately. 
> _mapproc takes one argument which is the desired byte offset into the
> virtual array.  This is probably easier to describe with code:
> 
>      def _mapproc(self, offset):
>          unmap_the_old_range()
>          mmap_a_new_range_that_includes_byteoffset()
>          self._mapmin = minimum_of_new_range()
>          self._mapmax = maximum_of_new_range()
> 
> In this way, when the delta between _mapmin and _mapmax is large
> enough, the UFuncs could act over a large contiguous portion of the
> _data array at a time before another remapping is necessary.  If the
> byteoffset that a UFunc needs to work with is outside of _mapmin and
> _mapmax, it must call _mapproc to remedy the situation.
> 
> This puts a lot of work into UFuncs that choose to support this.  I
> suppose that is tough to avoid though.
> 
We deal with memory mapping a completely differnent way. It's a bit late
for me to go into it in great detail, but we wrap the standard library
mmap module with a module that lets us manage memory mapped files.
This module basically memory maps an entire file and then in effect
mallocs segments of that file as buffer objects. This allocation of
subsets is needed to ensure that overlapping memory maps buffers
don't happen. One can basically reserve part of the memory mapped file
as a buffer. Once that is done, nothing else can use that part of the
file for another buffer. We do not intend to handle memory maps as a
way of sequentially mapping parts of the file to provide windowed views
as your code segment above suggests. If you want a buffer that is the
whole (large) file, you just get a mapped buffer to the whole thing.
(Why wouldn't you?)

The above scheme is needed for our purposes because many of our data files
contain multiple data arrays and we need a means of creating a numarray
object for each one. Most of this machinery has already been implemented,
but we haven't released it since our I/O package (for astronomical FITS
files) is not yet at the point of being able to use it.

> Also, there are threading issues to think about here.  I don't know if
> UFuncs are going to release the Global Interpreter Lock, but if they do
> it's possible that multiple threads could have the same PyObject and
> try to _mapproc different offsets at different times.
> 
To tell you the truth, we haven't dealt with the threading issue much. We
think about it occasionally, but have deferred dealing with it until 
we have finished other aspects first. We do want to make it thread safe
though.

Perry Greenfield


From oliphant at ee.byu.edu  Thu Apr 11 15:47:04 2002
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Thu Apr 11 15:47:04 2002
Subject: [Numpy-discussion] slice question and bug
In-Reply-To: <3CB60188.1010203@stsci.edu>
Message-ID: <Pine.LNX.4.33L2.0204111645520.32470-100000@oliphant.ee.byu.edu>

> Looking at the code to PyArray_Free,  I agree with Chris.  Called to
> free a 2D
> array, I think that PyArray_Free leaks all of the row storage because
> ap->nd == 2, not 3:
>
> * {%c++%} */
>      extern int PyArray_Free(PyObject *op, char *ptr) {
> 	 PyArrayObject *ap = (PyArrayObject *)op;
> 	 int i, n;
>
> 	 if (ap->nd > 2) return -1;
> 	 if (ap->nd == 3) {
> 	     n = ap->dimensions[0];
> 	     for (i=0; i<n; i++) {
> 		 free(((char **)ptr)[i]);
> 	     }
> 	 }
> 	 if (ap->nd >= 2) {
> 	     free(ptr);
> 	 }
> 	 Py_DECREF(ap);
> 	 return 0;
>      }
> /* {%c++%} */
>
>

This has been broken since the beginning.  I believe the documentation
says as much.  I've never used it because I always think of 2-D arrays as
a block of data not as rows of pointers.

It should be fixed, but no one's ever been interested enough to do it.

-Travis Oliphant


From xscottg at yahoo.com  Thu Apr 11 21:46:02 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Thu Apr 11 21:46:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFGEIHDNAA.perry@stsci.edu>
Message-ID: <20020412044201.63373.qmail@web12908.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:
>
> I guess we are not sure we understand what you mean by interface.
> In particular, we don't understand why sharing the same object
> attributes (the private ones you list above) is a benefit to the
> code you are writing if you aren't also using the low level
> implementation. The above attributes are private and nothing 
> external to the Class should depend on or even know about them.
> Could you elaborate on what you mean by interface and the
> relationship between your arrays and numarrays?
>

There are several places in your code that check to see if you are working with
a valid type for NDArrays.  Currently this check consists of asking the
following questions:

   'Is it a tuple or list?'
   'Is it a scalar of some sort?'
   'Does it derive from our NDArray class?'

If any of these questions answer true, it does the right thing and moves on. 
If none of these is true, it raises an exception.

I suppose this is fine if you are only concerned about working with your own
implementation of an array type, but I hope you'll consider the following as a
minor change that opens up the possibility for other compatible array
implementations to work interoperably.

Instead have the code ask the following questions:

   'Is it a tuple or list?'
   'Is it a scalar of some sort?'
   'Does it support the attributes necessary to be like an NDArray object?'

This change is very similar to how you can pass in any Python object to the
"pickle.dump()" function, and if it supports the "write()" method it will be
called:

      >>> class WhoKnows:
      ...     def write(self, x):
      ...          print x
      >>>
      >>> import pickle
      >>>
      >>> w = WhoKnows()
      >>>
      >>> pickle.dump('some data', w)
      S'some data'
      p1
      .

Until reading your response above, I didn't realize that you consider your
single underscore attributes to be totally private.  In general, I try to use a
single underscore to mean protected (meaning you can use them if you REALLY
know what you are doing), hence my confusion.  With that in mind, pretend that
I suggested the following instead:

    The specification of an NDArray is that it has the following attributes

        ndarray_buffer      - a PyObject which has PyBufferProcs
        ndarray_shape       - a tuple specifying the shape of the array
        ndarray_stride      - a tuple specifyinf the index multipliers
        ndarray_itemsize    - an int/long stating the size of items
        ndarray_itemtype    - some representation of type 

This would be a very minor change to your functions like inputarray(),
getNDInfo(), getNDArray(), but it would allow your UFuncs to work with other
implementations of arrays.  As an example similar to the pickle example above:

     import array
     class ScottArray:
         def __init__(self):
             self.ndarray_buffer   = array.array('d', [0]*100)
             self.ndarray_shape    = (10, 10)
             self.ndarray_stride   = (80, 8)
             self.ndarray_itemsize = 8
             self.ndarray_itemtype = 'Float64'

     import numarray

     n = numarray.numarray((10, 10), type='Float64')
     s = ScottArray()

     very_cool = numarray.add(n, s)


This example is kind of silly.  I mean, why wouldn't I just use numarray for
all of my array needs?  Well, that's where my world is a little different than
yours I think.  Instead of using 'array.array()' above, there are times where
I'll need to use 'whizbang.array()' to get a different PyBufferProcs supporting
object.  Or where I'll need to work with a crazy type in one part of the code,
but I'd like to pass it to an extension that combines your types and mine.

In these cases where I need "special memory" or "special types" I could try and
get you guys to accept a patch, but this would just pollute your project and
probably annoy you in general.  A better solution is to create a general
standard mechanism for implementing NDArray types, and let me make my own.


In the above example, we could have completely different NDArray
implementations working interoperably inside of one UFunc.  It seems to me that
all it really takes to be an NDArray can be specified by a list of attributes
like the one above.  (Probably need a few more attributes to be really general:
'ndarray_endian', etc...)  In the end, NDArrays are just pointers to a buffer,
and descriptors for indexing.


I don't believe this would have any significant affect on the performance of
numarray.  (The efficient fast C code still gets a pointer to work with.)  More
over, I'd be very willing to contribute patches to make this happen.


If you agree, and we can flesh out what this "attribute interface" should be,
then I can start distributing my own array module to the engineers where I work
without too much fear that they'll be screwed once numarray is stable and they
want to mix and match.

Code always lives a lot longer than I want it to, and if I give them something
now which doesn't work with your end product, I'll have done them a disservice.


BTW: Allowing other types to fill in as NDArrays also allows other types to
implement things like slicing as they see fit (slice and copy contiguious,
slice and copy on write, slice and copy by reference, etc...).

>
> We are hoping to get numarray into the distribution [it won't be the
> end of the world for us if it doesn't happen]. I'll warn you that the
> PEP is out of date. We are likely to update it only after we feel
> we are close to having the implementation ready for consideration 
> for including into the standard distribution. I would refer to the
> actual implementation and the design notes for the time being.
>

Yeah, I recognize that the PEP is gathering dust at the moment.  I'm not having
too much trouble following through the source and design docs.  It took me a
few days to "get it", but that's probably because I'm slower than your average
bear.  :-)

Regarding the PEP, what I would like to see happen is that if we agree that the
"attribute interface" stuff above is the right way to go about things, I would
(or we would) submit a milder interim PEP specifying what those attributes are,
how they are to be interpreted, and a simple Python module implementing a
general NDArray class for consumption.  Hopefully this PEP would specify a
canonical list of type names as well.  Then we could make updates to the other
PEP if necessary.


>
> Some of the name changes are worth considering (like replacing ._byteswap
> with an endian indicator, though I find _endian completely opaque as to
> what it would mean--1 means what? little or big?). (BTW, we already have
> _itemsize). _contiguous and _aligned are things we have been considering
> changing, but I would have to think about it carefully to determine if
> they really are redundant.
> 

It's all open for discussion, but I would propose that ndarray_endian be one
of:

    '>' - big endian
    '<' - little endian

This is how the standard Python struct module specifies endian, and I've been
trying to stay consistant with the baseline when possible.

>
> It looks like you are trying to deal with records with these "structs". 
> We deal with records (efficiently) in a completely different way. Take
> a look at the recarray module.
> 

Will definitely do.

I've called them structs simply because they borrow their format string from
the struct module that ships with Python.  I'm not hung up on the name, and I
wouldn't object to an alias.

Too early for me to tell if there is even a difference in the underlying
memory, but maybe we'll end up with 'structs' for my notion of things, and
'records' for yours.

>
> We deal with memory mapping a completely different way. It's a bit late
> for me to go into it in great detail, but we wrap the standard library
> mmap module with a module that lets us manage memory mapped files.
> This module basically memory maps an entire file and then in effect
> mallocs segments of that file as buffer objects. This allocation of
> subsets is needed to ensure that overlapping memory maps buffers
> don't happen. One can basically reserve part of the memory mapped file
> as a buffer. Once that is done, nothing else can use that part of the
> file for another buffer. We do not intend to handle memory maps as a
> way of sequentially mapping parts of the file to provide windowed views
> as your code segment above suggests. If you want a buffer that is the
> whole (large) file, you just get a mapped buffer to the whole thing.
> (Why wouldn't you?)
> 

I think the idea of taking a 500 megabyte (or 5 gigabyte) file, and windowing 1
meg of actual memory at time pretty attractive.  Sometimes we do very large
correlations, and there just isn't enough memory to mmap the whole file (much
less two files for correlation).

Any library that doesn't want to support this business could just raise a
NotImplemented error on encountering them.

Maybe I shouldn't be calling this "memory mapping".  Even though it could be
implemented on top of mmap, truthfully I just want to support a "windowing"
interface.  If we could specify the windowing attributes and indicate the
standard usage that would be great.  Maybe:

      ndarray_window(self, offset)
      ndarray_winmin
      ndarray_winmax


>
> The above scheme is needed for our purposes because many of our data files
> contain multiple data arrays and we need a means of creating a numarray
> object for each one. Most of this machinery has already been implemented,
> but we haven't released it since our I/O package (for astronomical FITS
> files) is not yet at the point of being able to use it.
> 

There is a group at my company that is using FITS for some stuff.  I don't know
enough about it to comment though...


Cheers,
    -Scott


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Fri Apr 12 17:44:04 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Fri Apr 12 17:44:04 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020412044201.63373.qmail@web12908.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCIEFGCNAA.perry@stsci.edu>

Scott Gilbert writes:
>      import array
>      class ScottArray:
>          def __init__(self):
>              self.ndarray_buffer   = array.array('d', [0]*100)
>              self.ndarray_shape    = (10, 10)
>              self.ndarray_stride   = (80, 8)
>              self.ndarray_itemsize = 8
>              self.ndarray_itemtype = 'Float64'
>
>      import numarray
>
>      n = numarray.numarray((10, 10), type='Float64')
>      s = ScottArray()
>
>      very_cool = numarray.add(n, s)
>
But why not (I may have some details wrong, I'm doing this
from memory, and I haven't worked on it myself in a bit):

import array
import numarray
import memory # comes with numarray
class ScottArray(NumArray):
    def __init__(self):
        # create necessary buffer obj
        buf = memory.writeable_buffer(array.array('d', [0]*100))
        Numarray.__init__(self, shape=(10, 10), type=numarray.Float64
                          buffer=buf)
        # _strides not settable from constructor yet, but currently
        # if you needed to set it:
        # self._strides = (80, 8)
        # But for this case it would be computed automatically from
        # the supplied shape


n = numarray.numarray((10, 10), type='Float64')
s = ScottArray()

maybe_not_quite_so_cool_but_just_as_functional = n + s

> This example is kind of silly.  I mean, why wouldn't I just use
> numarray for
> all of my array needs?  Well, that's where my world is a little
> different than
> yours I think.  Instead of using 'array.array()' above, there are
> times where
> I'll need to use 'whizbang.array()' to get a different
> PyBufferProcs supporting
> object.  Or where I'll need to work with a crazy type in one part
> of the code,
> but I'd like to pass it to an extension that combines your types and mine.
>
> In these cases where I need "special memory" or "special types" I
> could try and
> get you guys to accept a patch, but this would just pollute your
> project and
> probably annoy you in general.  A better solution is to create a general
> standard mechanism for implementing NDArray types, and let me make my own.
>
>From everything I've seen so far, I don't see why you can't
just create a NumArray object directly. You can subclass it
(and use multiple inheritance if you need to subclass a different
object as well) and add whatever customized behavior you want.
You can create new kinds of objects as buffers just so long
as you satisfy the buffer interface.
>
> In the above example, we could have completely different NDArray
> implementations working interoperably inside of one UFunc.  It
> seems to me that
> all it really takes to be an NDArray can be specified by a list
> of attributes
> like the one above.  (Probably need a few more attributes to be
> really general:
> 'ndarray_endian', etc...)  In the end, NDArrays are just pointers
> to a buffer,
> and descriptors for indexing.
>
Again, why not just create an NDArray object with the appropriate
buffer object and attributes (subclassing if necessary).

>
> I don't believe this would have any significant affect on the
> performance of
> numarray.  (The efficient fast C code still gets a pointer to
> work with.)  More
> over, I'd be very willing to contribute patches to make this happen.
>
>
> If you agree, and we can flesh out what this "attribute
> interface" should be,
> then I can start distributing my own array module to the
> engineers where I work
> without too much fear that they'll be screwed once numarray is
> stable and they
> want to mix and match.
>
> Code always lives a lot longer than I want it to, and if I give
> them something
> now which doesn't work with your end product, I'll have done them
> a disservice.
>
All good in principle, but I haven't yet seen a reason to change
numarray. As far as I can tell, it provides all you need exactly
as it is. If you could give an example that demonstrated otherwise...
>
> It's all open for discussion, but I would propose that
> ndarray_endian be one
> of:
>
>     '>' - big endian
>     '<' - little endian
>
> This is how the standard Python struct module specifies endian,
> and I've been
> trying to stay consistant with the baseline when possible.
>
To tell you the truth, I'm not crazy about how the struct module
handles types or attributes. It's generally far too cryptic for
my tastes. Other than providing backward compatibility, we aren't
interested in it emulating struct.

> >
> > The above scheme is needed for our purposes because many of our
> data files
> > contain multiple data arrays and we need a means of creating a numarray
> > object for each one. Most of this machinery has already been
> implemented,
> > but we haven't released it since our I/O package (for astronomical FITS
> > files) is not yet at the point of being able to use it.
> >
>
>
I could well misundertand, but I thought that if you mmap a file
in unix in write mode, you do not use up the virtual memory as
limited by the physical memory and the paging file. Your only
limit becomes the virtual address space available to the processor.
If the 32 bit address is your problem, you are far, far better off
using a 64-bit processor and operating system than trying to kludge up
a windowing memory mechanism. I could see a way of doing it for
ufuncs, but the numeric world (and I would think the DSP world
as well) needs far more than element-by-element array functionality.
providing a usable C-api for that kind of memory model would be
a nightmare. But I'm not sure if this or the page file is your
limitation.

Perry


From kragen at pobox.com  Sat Apr 13 00:25:01 2002
From: kragen at pobox.com (Kragen Sitaker)
Date: Sat Apr 13 00:25:01 2002
Subject: [Numpy-discussion] segfault in Numpy esxtension
Message-ID: <20020413072433.2702DBDC1@panacea.canonical.org>

(All of the below is with regard to Numeric 20.2.0.)

For a consulting client, I wrote a extension module that does the
equivalent of sum(take(a, b)), but without the temporary result in
between.  I was surprised that when I tried to .resize() the result of
this routine, I got a segmentation fault and a core dump.

It was crashing at this line in arrayobject.c:
	if (memcmp(self->descr->zero, all_zero, elsize) == 0) {

self->descr, in this case, was the type description for arrays of type
"double".  It seems that self->descr->zero was 0, as in a null
pointer, not a pointer to a location containing (double)0, and this
was causing it to crash.

It looks like the .zero fields of the type descriptions (which live in
arraytypes.c and _numpy.so) are initialized to be null pointers, and
only when the initmultiarray() function in multiarraymodule.c is run
are these pointers set to point to actual zeroes somewhere in
allocated memory.

I guess Numeric.py imports multiarray.so, which calls
initmultiarray(), so the solution for me was to make sure I import
Numeric before importing my module (or at least before resizing arrays
produced by my module).  But, to my mind, this segfault is a bug ---
importing a module that follows all the rules shouldn't put Python in
a state that's so dangerously inconsistent that innocent things like
.resize() can crash it.  Maybe the same .so file that includes the
actual data items should be responsible for initializing them ---
especially since import_array() imports _numpy without importing
multiarray.  (I assume there's a reason it wasn't done this way in the
first place.)  What do other people think?

-- 
/* By Kragen Sitaker, http://pobox.com/~kragen/puzzle4.html */
char b[2][10000],*s,*t=b,*d,*e=b+1,**p;main(int c,char**v){int n=atoi(v[1]);
strcpy(b,v[2]);while(n--){for(s=t,d=e;*s;s++){for(p=v+3;*p;p++)if(**p==*s){
strcpy(d,*p+2);d+=strlen(d);goto x;}*d++=*s;x:}s=t;t=e;e=s;*d++=0;}puts(t);}


From xscottg at yahoo.com  Sat Apr 13 03:09:04 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sat Apr 13 03:09:04 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCIEFGCNAA.perry@stsci.edu>
Message-ID: <20020413100823.45837.qmail@web12907.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:
> Scott Gilbert writes:
[...]
> >
> >      very_cool = numarray.add(n, s)
> >
> But why not (I may have some details wrong, I'm doing this
> from memory, and I haven't worked on it myself in a bit):
> 
[...]
>
> maybe_not_quite_so_cool_but_just_as_functional = n + s
>
[...]
>
> From everything I've seen so far, I don't see why you can't
> just create a NumArray object directly. You can subclass it
> (and use multiple inheritance if you need to subclass a different
> object as well) and add whatever customized behavior you want.
> You can create new kinds of objects as buffers just so long
> as you satisfy the buffer interface.
>

Your point about the optional buffer parameter to the NumArray is well
taken.  I had seen that when looking through the code, but it slipped my
mind for that example.  I could very well be wrong about some of these
other reasons too...

I have a number of reasons listed below for wanting the standard that 
Python adopts to specify only the interface and not the implementation. 
You may not find all of these pursuasive, and I apologize in advance if any
looks like a criticism.  (In my limited years as a professional software
developer, I've found that the majority of people can be very defensive and
protective of their code.  I've been trying to tread lightly, but I don't
know if I'm succeeding.)

However if any of these reasons is persuasive, keep in mind that the actual
changes I'm proposing are pretty minimal in scope.  And that I'd be willing
to submit patches so as to reduce any inconvenience to you.  (Not that you
have any reason to believe I can code my way out of a box...  :-)

Ok, here's my list:

Philosophical

  You have a proposal in to the Python guys to make Numarray into the
  standard _implementation_.  I think standards like this should specify
  an _interface_, not an implementation.

Simplicity

  I can give my users a single XArray.py file, and they can be off and
  running with something that works right then and there, and it could in
  many ways be compatible with Numarray (with some slight modifications)
  when they decide they want the extra functionality of extension modules
  that you or anyone else who follows your standard provides.  But they
  don't have to compile anything until they really need to.

  Your implementation leaves me with all or nothing.  I'll have to build
  and use numarray, or I've got an in house only solution.

Expediency

  I want to see a usable standard arise quickly.  If you maintain the
  stance that we should all use the Numarray implementation, instead of
  just defining a good Numarray interface, everyone has to wait for you
  to finish things enough to get them accepted by the Python group.  Your
  implementation is complicated, and I suspect they will have many things
  that they will want you to change before they accept it into their
  baseline.  (If you think my list of suggestions is annoying, wait until
  you see theirs!)

  If a simple interface protocol is presented, and a simple pure Python
  module that implements it.  The PEP acceptance process might move along
  quickly, but you could take your time with implementing your code.

Pragmatic

  You guys aren't finished yet, and I need to give my users an array
  module ASAP.  As such a new project, there are likely to be many bugs
  floating around in there.  I think that when you are done, you will
  probably have a very good library.  Moreover, I'm grateful that you are
  making it open source.  That's very generous of you, and the fact that
  you are tolerating this discussion is definitely appreciated.

  Still, I can't put off my projects, and I can't task you to work faster. 


  However, I do think we could agree in a very short term that your design
  for the interface is a good one.  I also think that we (or just me if you
  like) could make a much smaller PEP that would be more readily accepted.
  Then everyone in this community could proceed at their own pace - knowing
  that if we followed the simple standard we would have inter operability
  with each other.

Social

  Normally I wouldn't expect you to care about any of my special issues.
  You have your own problems to solve.  As I said above, it's generous of
  you to even offer your source code.
  
  However, you are (or at least were) trying to push for this to become a
  standard.  As such, considering how to be more general and apply to a 
  wider class of problems should be on your agenda.  If it's not, then you
  shouldn't be creating the standard.

  If you don't care about numarray becoming standard, I would like to try
  my hand at submitting the slightly modified version of your design.  I
  won't be compatible with your stuff, but hopefully others will follow
  suit.

Functionality

  Data Types

    I have needs for other types of data that you probably have little use
    for.  If I can't coerce you to make a minor change in specification, I
    really don't think I could coerce you to support brand new data types
    (complex ints is the one I've beaten to death, because I could use that

    one in the short term).  What happens when someone at my company wants
    quaternions?  I suspect that you won't have direct support for those.
    I know that numarray is supposed to be extensible, but the following
    raises an exception:

        from numarray import *

        class QuaternionType(NumericType):
            def __init__(self):
                NumericType.__init__(self, "Quaternion", 4*8, 0)

        Quaternion = QuaternionType()  # BOOM!

        q = array(shape=(10, 10), type=Quaternion)

    Maybe I'm just doing something wrong, but it looks like your code
    wants "Quaternion" to be in your (private?) typeConverters dictionary.

    Ok, try two:

        from numarray import *

        q = NDArray(shape=(10, 10), itemsize=4*8)

        if a[5][5] is None:
            print "No boom, but what can I do with it?"

    Maybe this is just a documentation problem.  On the other hand, I can
    do the following pretty readily:

        import array
        class Quat2D:
            def __init__(self, *shape):
                assert len(shape) == 2
                self._buffer = array.array('d', [0])*shape[0]*shape[1]*4
                self._shape, self._stride = tuple(shape), (4*shape[0], 4)
                self._itemsize = 4*8

            def __getitem__(self, sub):
                assert isinstance(sub, tuple) and len(sub) == 2
                offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
                return tuple([self._buffer[offset + i] for i in range(4)])

            def __setitem__(self, sub, val):
                assert isinstance(sub, tuple) and len(sub) == 2
                offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
                for i in range(4): self._buffer[offset + i] = val[i]
                return val

        q = Quat2D(10, 10)
        q[5, 5] = (1, 2, 3, 4)
        print q[5, 5]

    This isn't very general, but it is short, and it makes a good example.

    If they get half of their data from calculations using Numarray, and
    half from whatever I provide them, and then try to mix the results in
    an extension module that has to know about separate implementations,
    life is more complicated than it should be.

  Operations

    I'm going to have to write my own C extension modules for some high
    performance operations.  All I need to get this done is a void*
pointer,
    the shape, stride, itemsize, itemtype, and maybe some other things to
    get off and running.  You have a growing framework, and you have
already
    indicated that you think of your hidden variables as private.  I don't
    think I or my users should have to understand the whole UFunc framework
    and API just to create an extension that manipulates a pointer to an
    array of doubles.

    Arrays are simpler than UFuncs.  I consider them to be pretty seperable
    parts of your design.  If you keep it this way, and it becomes the
    standard, it seems that I and everyone else will have to understand
    both parts in order to create an extension module.

Flexibility

  Numarray is going to make a choice of how to implement slicing.  My guess
  is that it will be one of "copy contiguous", "copy on write", "copy by 
  reference".  I don't know what the correct choice is, but I know that
  someone else will need something different based on context.  Things like
  UFuncs and other extension modules that do fast C level calculations
  typically don't need to concern themselves with slicing behaviour.

Design

  Your implementation would be similar to having the 'pickle' module
  require you to derive from a 'Pickleable' base class - instead of simply
  providing __getstate__ and __setstate__ methods.

  It's an artificial constraint, and those are usually bad.

>
> All good in principle, but I haven't yet seen a reason to change
> numarray. As far as I can tell, it provides all you need exactly
> as it is. If you could give an example that demonstrated otherwise...
>

Maybe you're right.  I suspect you as the author will come up with the
quick example that shows how to implement my bizarre quaternion example
above.  I'm not sure if this makes either of us right or wrong, but if
you're not buying any of this, then it's probably time for me to chock
this off to a difference in opinion and move on.

Truthfully this is taking me pretty far from my original tack.  Originally
I had simply hoped to hack a couple of things into arraymodule.c, and here
I am now trying to get a simpler standard in place.  I'll try one last time
to convince you with the following two statements:

  - Changing such that you only require the interface is a subtle,
    but noticeable, improvement to your otherwise very good design.

  - It's not a difficult change.


If that doesn't compel you, at least I can walk away knowing I tried.  For
the volumes I've written, this will probably be my last pesky message if
you really don't want to budge on this issue.


>
> To tell you the truth, I'm not crazy about how the struct module
> handles types or attributes. It's generally far too cryptic for
> my tastes. Other than providing backward compatibility, we aren't
> interested in it emulating struct.
>

I consider it a lot like regular expressions.  I cringe when I see someone
else's, but I don't have much difficulty putting them together.

The alternative of coming up with a different specifier for records/structs
is probably a mistake now that the struct module already has it's (terse)
format specification.  Once that is taken into consideration, following all
the leads of the struct module makes sense to me.

 
>
> I could well misunderstand, but I thought that if you mmap a file
> in unix in write mode, you do not use up the virtual memory as
> limited by the physical memory and the paging file. Your only
> limit becomes the virtual address space available to the processor.
>

Regarding efficiency, it depends on the implementations, which vary
greatly, and there are other subtleties.  I've already written a book
above, so I won't tire you with details.  I will say that closing a large
memory mapped file on top of NFS can be dreadful.  It probably takes the
same amount of total time or less, but from an interactive analysys point
of view it's pretty unpleasant on Tru64 at least.

Also, just mmaping the whole file puts all of the memory use at the
discretion of the OS.  I might have a gig or two to work with, but if mmap
takes them all, other threads will have to contend for memory.  The system
(application) as a whole might very well run better if I can retain some
control over this.


I'm not married to the windowing suggestion.  I think it's something to
consider, but it might not be a common enough case to try and make a
standard mechanism for.  If there isn't a way to do it without a kluge,
then I'll drop it.  Likewise if a simple strategy can't meet anyone's real
needs.


>
> If the 32 bit address is your problem, you are far, far better off
> using a 64-bit processor and operating system than trying to kludge up
> a windowing memory mechanism.
>

We don't always get to specify what platform we want to run on.  Our
customer has other needs, and sometimes hardware support for exotic devices
dictate what we'll be using.  Frequently it is on 64 bit Alphas, but
sometimes the requirement is x86 Linux, or 32 bit Solaris.

Finally, our most frustrating piece of legacy software was written in
Fortran assuming you could stuff a pointer into an INT*4 and now requires
the -taso flag to the compiler for all new code (which turns a sexy 64 bit
Alpha into a 32 bit kluge...).

Also, much of our data comes on tapes.  It's not easy to memory map those.

>
> I could see a way of doing it for
> ufuncs, but the numeric world (and I would think the DSP world
> as well) needs far more than element-by-element array functionality.
> providing a usable C-api for that kind of memory model would be
> a nightmare. But I'm not sure if this or the page file is your
> limitation.
>

I would suggest that any extension module which is not interested in this
feature simply raise a NotImplemented exception of some sort.  UFuncs could
fall into this camp without any criticism from me.  All it would have to do
is check if the 'window_get' attribute is a callable, and punt an
exception. 

My proposal wasn't necessarily to map in a single element at a time.  If
the C extension was willing to work these beasts at all, it would check to
see if the offset it wanted was between window_min and window_max.  If it
wasn't, then it would call ob.window_get(offset), and the Python object
could update window_min and window_max however it sees fit.  For instance
by remapping 10 or 20 megabytes on both sides.

This particular implementation would allow us to do correlations of a small
(mega sample) chunk of data against a HUGE (giga sample) file.

This might be the wrong interface, and I'm willing to listen to a better
suggestion.

It might also be too special of a need to detract from a simpler overall
design.

Also, there are other uses for things like this.  It could possibly be used
to implement sparse arrays.  It's probably not the best implementation of
that, but it could hide a dict of set data points, and present it to an
extension module as a complete array.


Cheers,
    -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Sat Apr 13 18:43:02 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Sat Apr 13 18:43:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020413100823.45837.qmail@web12907.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCEEFHCNAA.perry@stsci.edu>

> Ok, here's my list:
>
> Philosophical
>
>   You have a proposal in to the Python guys to make Numarray into the
>   standard _implementation_.  I think standards like this should specify
>   an _interface_, not an implementation.
>
Sure (though there is often more to a standard than just an interface,
but certainly an implementation is generally not the standard). I'm
not sure why you think we imply the implementation is the standard.
We are waiting to rewrite the PEP when we are closer to having
the implementation ready, but we've been very open about the design
and have asked for input on it for a long time now.

> Simplicity
>
>   I can give my users a single XArray.py file, and they can be off and
>   running with something that works right then and there, and it could in
>   many ways be compatible with Numarray (with some slight modifications)
>   when they decide they want the extra functionality of extension modules
>   that you or anyone else who follows your standard provides.  But they
>   don't have to compile anything until they really need to.
>
>   Your implementation leaves me with all or nothing.  I'll have to build
>   and use numarray, or I've got an in house only solution.
>
Hard to comment on this.

> Expediency
>
>   I want to see a usable standard arise quickly.  If you maintain the
>   stance that we should all use the Numarray implementation, instead of
>   just defining a good Numarray interface, everyone has to wait for you
>   to finish things enough to get them accepted by the Python group.  Your
>   implementation is complicated, and I suspect they will have many things
>   that they will want you to change before they accept it into their
>   baseline.  (If you think my list of suggestions is annoying, wait until
>   you see theirs!)
>
I have the strong sense you misunderstand how the process works.
Guido will be driven in large part by the acceptance or non-acceptance
of the Numeric community. If they don't buy into it. It won't be
part of the standard. If it won't be used by many, it won't be part
of the standard. Yes, he will review the design and interface to see
if there should be a long term commitment by the Python maintainers
to have it in the standard library. We have sent him the design
documents, and we do keep him informed. He  has given us feedback
about it. But for the most part, the judgement is going to be by
the Numeric community.

>   If a simple interface protocol is presented, and a simple pure Python
>   module that implements it.  The PEP acceptance process might move along
>   quickly, but you could take your time with implementing your code.
>
> Pragmatic
>
>   You guys aren't finished yet, and I need to give my users an array
>   module ASAP.  As such a new project, there are likely to be many bugs
>   floating around in there.  I think that when you are done, you will
>   probably have a very good library.  Moreover, I'm grateful that you are
>   making it open source.  That's very generous of you, and the fact that
>   you are tolerating this discussion is definitely appreciated.
>
>   Still, I can't put off my projects, and I can't task you to
> work faster.
>
>
>   However, I do think we could agree in a very short term that your design
>   for the interface is a good one.  I also think that we (or just
> me if you
>   like) could make a much smaller PEP that would be more readily accepted.
>   Then everyone in this community could proceed at their own pace
> - knowing
>   that if we followed the simple standard we would have inter operability
>   with each other.
>
I think we still don't understand what you need yet. More elaboration
on that later.

> Social
>
>   Normally I wouldn't expect you to care about any of my special issues.
>   You have your own problems to solve.  As I said above, it's generous of
>   you to even offer your source code.
>
>   However, you are (or at least were) trying to push for this to become a
>   standard.  As such, considering how to be more general and apply to a
>   wider class of problems should be on your agenda.  If it's not, then you
>   shouldn't be creating the standard.
>
Pleeease. Just because a library developer doesn't happen to meet your
needs doesn't mean it can't be part of the standard library. There
are plenty of modules in the standard library that could have been
made more general in some way, but there they are. The criteria is
whether it solves problems for a large community of users, not that
it is infinitely extensible or so on. Software development is full of
trade-offs and that includes limits to generalization. Sure we
can discuss whether things could be made more general or not. But
because you want it more general doesn't mean we just say "Sure, you
define everything!"

>   If you don't care about numarray becoming standard, I would like to try
>   my hand at submitting the slightly modified version of your design.  I
>   won't be compatible with your stuff, but hopefully others will follow
>   suit.
>
You are free to propose your own standard at any time. No one will
stop you from doing so.

> Functionality
>
>   Data Types
>
>     I have needs for other types of data that you probably have little use
>     for.  If I can't coerce you to make a minor change in specification, I
>     really don't think I could coerce you to support brand new data types
>     (complex ints is the one I've beaten to death, because I
> could use that
>
You are right on complex ints (that we won't consider them). One
could take numarray and add them if one wanted and have a more
extended version. But we won't do it, and we wouldn't support as
being in what we maintain. It's one of those trade offs.

>     one in the short term).  What happens when someone at my company wants
>     quaternions?  I suspect that you won't have direct support for those.
>     I know that numarray is supposed to be extensible, but the following
>     raises an exception:
>
>         from numarray import *
>
>         class QuaternionType(NumericType):
>             def __init__(self):
>                 NumericType.__init__(self, "Quaternion", 4*8, 0)
>
>         Quaternion = QuaternionType()  # BOOM!
>
>         q = array(shape=(10, 10), type=Quaternion)
>
>     Maybe I'm just doing something wrong, but it looks like your code
>     wants "Quaternion" to be in your (private?) typeConverters dictionary.
>
Yep, and there's a good reason for that. Just spend a few minutes
thinking about the role types play with array packages and how they
have traditionally been implemented. Generally speaking, it is
presumed that any two numeric types may be used in a binary operator.
So you, Scott, define your special type, Quaternions. You will need
to provide the module all the machinery for knowing what to do with
all the other numeric types available. You may not care, but it is
a requirement that numarray (and Numeric) know what to do. If that
doesn't fit in with your needs, then you shouldn't be trying to use
it. The problem is worse than that. You supply a Quaternion type extension
to numarray, and Bob supplies a super long int type (64 bytes!) also.
Both of you have gone to the trouble of giving numarray the means of
handling all other default numarray types. But you don't know to
handle each other. How do you solve that problem? I don't know.
If you do, let us know. Given the requirements, adding new numeric
types is not going to allow indepenent extensions to work with each
other. That's fairly limiting, but that's the price that is paid
for the feature.

>     Ok, try two:
>
>         from numarray import *
>
>         q = NDArray(shape=(10, 10), itemsize=4*8)
>
>         if a[5][5] is None:
>             print "No boom, but what can I do with it?"
>
>     Maybe this is just a documentation problem.  On the other hand, I can
>     do the following pretty readily:
>
>         import array
>         class Quat2D:
>             def __init__(self, *shape):
>                 assert len(shape) == 2
>                 self._buffer = array.array('d', [0])*shape[0]*shape[1]*4
>                 self._shape, self._stride = tuple(shape), (4*shape[0], 4)
>                 self._itemsize = 4*8
>
>             def __getitem__(self, sub):
>                 assert isinstance(sub, tuple) and len(sub) == 2
>                 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
>                 return tuple([self._buffer[offset + i] for i in range(4)])
>
>             def __setitem__(self, sub, val):
>                 assert isinstance(sub, tuple) and len(sub) == 2
>                 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1]
>                 for i in range(4): self._buffer[offset + i] = val[i]
>                 return val
>
>         q = Quat2D(10, 10)
>         q[5, 5] = (1, 2, 3, 4)
>         print q[5, 5]
>
>     This isn't very general, but it is short, and it makes a good example.
>
I'm not sure what it proves. If all you need is an array to store
some kind of type, be able to index and slice it, and not provide
numeric operations, by all means use the existing array module, it
does that fine. It's more work to subclass NDArray, but it can do
it too, and gives you more capabilities (you won't be able to use
index arrays or broadcasting in the array module for example). The
extra functionality comes at some price. Sure, it isn't as simple to
extend. It's your choice if it is worth it or not. If you want
to add your large quaterion array efficiently, then the array
module is worthless. Your example shows nothing about what your
real needs for the object are.

>     If they get half of their data from calculations using Numarray, and
>     half from whatever I provide them, and then try to mix the results in
>     an extension module that has to know about separate implementations,
>     life is more complicated than it should be.
>
It's how you intend to 'mix' these that I have no clue about.

>   Operations
>
>     I'm going to have to write my own C extension modules for some high
>     performance operations.  All I need to get this done is a void*
> pointer,
>     the shape, stride, itemsize, itemtype, and maybe some other things to
>     get off and running.  You have a growing framework, and you have
> already
>     indicated that you think of your hidden variables as private.  I don't
>     think I or my users should have to understand the whole UFunc
> framework
>     and API just to create an extension that manipulates a pointer to an
>     array of doubles.
>
Sigh. No one said you had to understand the ufunc framework to do so.
We are working on an C API that just gives you a simple pointer (it's
actually available now, but we aren't going to tout it until we have
better documentation).

>     Arrays are simpler than UFuncs.  I consider them to be pretty
> seperable
>     parts of your design.  If you keep it this way, and it becomes the
>     standard, it seems that I and everyone else will have to understand
>     both parts in order to create an extension module.
>
Wrong.

> Flexibility
>
>   Numarray is going to make a choice of how to implement slicing.
>  My guess
>   is that it will be one of "copy contiguous", "copy on write", "copy by
>   reference".  I don't know what the correct choice is, but I know that
>   someone else will need something different based on context.
> Things like
>   UFuncs and other extension modules that do fast C level calculations
>   typically don't need to concern themselves with slicing behaviour.
>
And they don't.

> Design
>
>   Your implementation would be similar to having the 'pickle' module
>   require you to derive from a 'Pickleable' base class - instead of simply
>   providing __getstate__ and __setstate__ methods.
>
>   It's an artificial constraint, and those are usually bad.
>
You say. You are quite welcome do your own implementation that
doesn't have this 'artificial' constraint. After all your text
I *still* don't understand how you intend to use the 'interface'
of the private attributes. You haven't provided any example (let
alone a compelling one) of why we should accept any object that
provides those attributes. Shoudn't the object also provide all
the public methods. Shouldn't also provide indexing and so forth.
All in all you are talking about checking quite a few attributes
to make sure the object has the interface. And even if it does,
*why* in the world would we presume that the C functions used by
numarray would work properly with the object you provide. I
really don't have a clue as to what you are getting at here, and
without some real concrete example illustrating this point, I
don't think there is any point to continuing this discussion.
> >
> > All good in principle, but I haven't yet seen a reason to change
> > numarray. As far as I can tell, it provides all you need exactly
> > as it is. If you could give an example that demonstrated otherwise...
> >
>
> Maybe you're right.  I suspect you as the author will come up with the
> quick example that shows how to implement my bizarre quaternion example
> above.  I'm not sure if this makes either of us right or wrong, but if
> you're not buying any of this, then it's probably time for me to chock
> this off to a difference in opinion and move on.
>
> Truthfully this is taking me pretty far from my original tack.  Originally
> I had simply hoped to hack a couple of things into arraymodule.c, and here
> I am now trying to get a simpler standard in place.  I'll try one
> last time
> to convince you with the following two statements:
>
>   - Changing such that you only require the interface is a subtle,
>     but noticeable, improvement to your otherwise very good design.
>
>   - It's not a difficult change.
>
>
> If that doesn't compel you, at least I can walk away knowing I tried.  For
> the volumes I've written, this will probably be my last pesky message if
> you really don't want to budge on this issue.
>
We're not going to budge until you show us what the hell you are talking
about.
>
> The alternative of coming up with a different specifier for
> records/structs
> is probably a mistake now that the struct module already has it's (terse)
> format specification.  Once that is taken into consideration,
> following all
> the leads of the struct module makes sense to me.
>
Again, you are free to do your own, or fork our numarray and
do it the way you want. Or do your own from scratch. Or whatever.
>
[...]
> Also, just mmaping the whole file puts all of the memory use at the
> discretion of the OS.  I might have a gig or two to work with, but if mmap
> takes them all, other threads will have to contend for memory.  The system
> (application) as a whole might very well run better if I can retain some
> control over this.
>
>
> I'm not married to the windowing suggestion.  I think it's something to
> consider, but it might not be a common enough case to try and make a
> standard mechanism for.  If there isn't a way to do it without a kluge,
> then I'll drop it.  Likewise if a simple strategy can't meet anyone's real
> needs.
>
You can forget our doing it. It's out of the question for us.
> >
> > If the 32 bit address is your problem, you are far, far better off
> > using a 64-bit processor and operating system than trying to kludge up
> > a windowing memory mechanism.
> >
>
> We don't always get to specify what platform we want to run on.  Our
> customer has other needs, and sometimes hardware support for
> exotic devices
> dictate what we'll be using.  Frequently it is on 64 bit Alphas, but
> sometimes the requirement is x86 Linux, or 32 bit Solaris.
>
> Finally, our most frustrating piece of legacy software was written in
> Fortran assuming you could stuff a pointer into an INT*4 and now requires
> the -taso flag to the compiler for all new code (which turns a sexy 64 bit
> Alpha into a 32 bit kluge...).
>
You may have customers with unreasonable demands. We don't have to
let them cause an incredible complication in the underlying machinery.
(And we won't). And we won't make it work on Windows 3.1 either.
We have to draw the line somewhere. Your customers will pay dearly
(and you will benefit :-).

> Also, much of our data comes on tapes.  It's not easy to memory map those.
>
Your point being?
> >
>
[...]

This doesn't seem to be going anywhere. If you can give us
a better idea of how your interface needs would be used,
at least we could respond to the specific issues. But we
don't understand and although we are considering some
changes, I'm not going to fold in your requests until
we do understand.

You may not be happy with the progress we are making either.
Sorry, I can't help that. If you need something sooner,
you'll need to do something else. Come up with your
own system and try to get it into Python. Take numarray
and do it the way you think it ought to be done and at
the rate you think it should be done. You're welcome to.
Take the array module and use that as a basis.

We'd like numarray to be part of the standard. We'd like
it to be the standard package in the Numeric community.
But if neither happened, we'd still be working on it.
We need it for our own work. Numeric doesn't give us
the capabilities that we need. We are using it for
our software development and it is being used to reduce
HST data now. We are continuing on this regardless.

Perry


From paul at pfdubois.com  Sat Apr 13 19:35:02 2002
From: paul at pfdubois.com (Paul F Dubois)
Date: Sat Apr 13 19:35:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCEEFHCNAA.perry@stsci.edu>
Message-ID: <000001c1e35c$d85a2f90$0a01a8c0@NICKLEBY>

I haven't been following this discussion (I have a product release on
Monday). But I am getting a lot of mail stacking up for numpy-developers
which will not go through unless you are one of the registered
developers mailing from your registered mail account. 

All others, please do not use numpy-developers. This is a private
channel for the official  developers only.

I gather from my brief reading that someone is looking for a standard to
use now. That standard is Numeric. If you go with that now then when the
time comes to switch to Numarray, you'll be in the same boat as the
whole community and therefore liable to be able to profit from any
conversion tools required. You can reduce your problems to a minimum by
sticking with the Python interface where possible.

If you have some special need that Numeric is not meeting please realize
that what exists is a consensus product after a long evolution and it is
not likely to change much to meet  your particular needs. There are some
areas where what is right for one set of people is wrong for the others.


From xscottg at yahoo.com  Sun Apr 14 04:20:03 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sun Apr 14 04:20:03 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCEEFHCNAA.perry@stsci.edu>
Message-ID: <20020414111911.2977.qmail@web12901.mail.yahoo.com>

Perry, I've been trying to be persuasive, but I think all I've 
managed to do is to be verbose and annoy you.  Please accept 
my apologies.

I really am sorry this is going as poorly as it is.  I'm doing a lousy
job of getting my point across, and I'd like to turn around the tone
this has taken.  Email always comes off as more antagonistic
than intended.

Finally, my appeal to the fact that you are proposing a standard
was heavy handed.  I guess I was trying to use that to force
you to consider my position.  It clearly backfired...

I'll try to be more to the point.


Here's what I'm proposing, and it's only a suggestion.


*** I think the requirements for being a general purpose "NDArray" 
can be specified with only the following attributes:

    __array_buffer__    - as buffer object
    __array_shape__     - as tuple of long
    __array_itemsize__  - as int

    Optionally
    __array_stride__    - as tuple of long (get from shape if None)
    __array_offset__    - as int (would default to 0 if not present)

Then anyone who implemented these could work with the same C API for
getting the pointer to memory, shape array, stride array, and item size.  

The set of operations on a pure "NDArray" is probably pretty minimal
(reshape, transpose/rotate, index arrays?).

So in order to create a full featured "NumArray", a few more attributes
are required:

    __array_itemtype__  - as string?

    Optionally
    __array_endian__    - as 1 char string?  (default to the native endian)

This brings the total up to 4 required attributes, and 3 optional ones 
for a very general purpose array data structure.  (I can think of other 
optional ones, but skip that for now.)


>
> All in all you are talking about checking quite a few attributes
> to make sure the object has the interface. And even if it does,
> *why* in the world would we presume that the C functions used by
> numarray would work properly with the object you provide.
>

Because truthfully arrays are little more than a pointer to memory.

That's like asking "why in the world would we presume memcpy() or 
qsort() would know what to do with your memory?"


>
> You haven't provided any example (let
> alone a compelling one) of why we should accept any object that
> provides those attributes.
>

Well, the UFuncs certainly should reject any object that they don't
know how to handle.  I'm currently only addressing what it takes to be
an NDArray/NumArray object.  OTOH, if I can present something to the
UFuncs that looks like a known array type, why wouldn't UFuncs
want to work with it?


Ok, so what does this buy you?  

Well, it probably doesn't buy you personally very much.  Your needs are
already being met by the current implementation.


Ok, so what does this cost you?

A few translations:

    _data       -> __array_buffer__
    _shape      -> __array_shape__
    _strides    -> __array_stride__
    _itemsize   -> __array_itemsize__
    _offset     -> __array_offset__
    _type       -> __array_type__
    _byteswap   -> __array_endian__

This isn't a style criticism.  I'm not just asking you to change your
names,
I'm asking to promote the names to be a "standard interface" much like
these things are in many places in Python.

Also requires some small changes to getNDInfo() and getNumInfo()
so that they can calculate the derived fields (contiguous, aligned,
etc...).

Also requires some changes to your scripts so that it checks for
the interface rather than the inheritance.


What are the benefits to anyone else?

- Describes how anyone could implement something that looks and acts
like NDArrays or NumArrays.  There are probably a lot of reasons to
want to do this.  I have some reasons that I don't think you value
too much.  I think others would have reasons which I can't imagine too.

- Allows one standard API for getting at the basics of NDArrays/NumArrays

- Allows anyone to easily implement other data types for NumArrays.
The typecode won't match any of your builtin types, but maybe other
third parties could agree on other typecodes for their crazy needs and
share modules.

- Allows me personally to distribute a separate (and simpler)
implementation of NDArrays/NumArrays right now and have the same data
objects work with yours when you're all done.  If I give the UFuncs a
pointer to memory, and the attributes above, why shouldn't it work
correctly?


>
> We're not going to budge until you show us what the hell you are talking
> about.
>

Am I doing any better?  I am trying.


>
> You are right on complex ints (that we won't consider them). One
> could take numarray and add them if one wanted and have a more
> extended version. But we won't do it, and we wouldn't support as
> being in what we maintain. It's one of those trade offs.
>

Is there a way, today, without modifying numarray, for me to use
numarray as a holder for these esoteric data types?  Is that way difficult?
 Could it be easier?

I'm not asking numarray to know about my types in it's core baseline.  I'm
wondering what it takes to implement new types at all.


>
> Your example shows nothing about what your
> real needs for the object are.
>

My real needs are all over the place.  Some of which you've shown me
are solvable with the current implementation of numarray.  Some of
which you've not addressed or said you won't address.


To be explicit:

Here are (at least most of) my _needs_ for array objects:

      - support a wide variety of data types (user defined)
      - have efficient storage
      - support the pickle interface for serialization
      - allow alternate sources of underlying memory
      - have an easy interface for accessing the pieces
        necessary to create C extensions (buffer, shape, stride, ...)
      - completed and reliable in the near term

Here are (at least some of) my _wants_ for array objects:

      - cooperate on some level with other standard array
        modules (once the standard is set)
      - have same API for accessing the pieces (buffer, shape,
        stride, ...) as all standard array modules will.
      - implementation in pure Python so that building extension
        modules is not required until the fast operations present
        in those modules is required.
      - implemented from a standard that is as good as it can be

Here are (at least some of) my _whims_ for array objects:

      - has "windowing" functionality to work efficiently with
        really large files (on any modern platform).
      - alternate implementations for things such as "slicing
        behaviour" (copy on write, reference).


Loosely following your design, I've already written a module that meets 
my "needs", I was hoping that we could cooperate towards filling in some
of my "wants" (cooperating array modules), and I've brought up my "whims"
because I thought they were interesting possibilities for discussion.


I was going to respond to some of your other remarks, but I've probably
wasted enough of your time.  If you don't respond to this message, I'll
take that as a sign that we just aren't going to see eye to eye on any of 
this, and I won't bother you any more. 

(I'll be half surprised if you even get this message.  From the tone
of your last one, I wouldn't be shocked to find out you've already
added me to your killfile. :-)


No hard feelings,
      -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Sun Apr 14 11:55:02 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Sun Apr 14 11:55:02 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020414111911.2977.qmail@web12901.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCIEFICNAA.perry@stsci.edu>

Hi Scott,

Just to be to the point, I'm still missing what I've been
asking for, to wit a concrete example that illustrates your
point. I'll try to address a few of your points that appear
to try to answer that and clarify what I mean by concrete
example.
>
> Here's what I'm proposing, and it's only a suggestion.
>
>
> *** I think the requirements for being a general purpose "NDArray"
> can be specified with only the following attributes:
>
>     __array_buffer__    - as buffer object
>     __array_shape__     - as tuple of long
>     __array_itemsize__  - as int
>
>     Optionally
>     __array_stride__    - as tuple of long (get from shape if None)
>     __array_offset__    - as int (would default to 0 if not present)
>
> Then anyone who implemented these could work with the same C API for
> getting the pointer to memory, shape array, stride array, and item size.
>
Then you are talking about standardizing a C-API. But I'm still
confused. If you write a class that implements these attributes,
is it your C-API that uses them, or do you mean our C-API uses
them? If you have your own C-API, then the attributes are not
relevant as an interface. If you intend to use our C-API to access
your objects, then they are. But if you want to use our C-API,
that still doesn't explain why the alternatives aren't acceptable
(namely subclassing).

>
> Because truthfully arrays are little more than a pointer to memory.
>
> That's like asking "why in the world would we presume memcpy() or
> qsort() would know what to do with your memory?"
>
Then you misunderstand Numarray. Numarrays are far more than just
a pointer to memory. You can get a pointer to memory from them,
but they entail much more than that. Numarray presumes that certain
things are possible with NumArray objects (like standard math
operations). If you want something that doesn't make such an
assumption, you should be using NDArray instead. NDArray makes
no presumptions about the contents of the memory other than
they are arranged in memory in array fashion.
>
> >
> > You haven't provided any example (let
> > alone a compelling one) of why we should accept any object that
> > provides those attributes.
> >
>
> Well, the UFuncs certainly should reject any object that they don't
> know how to handle.  I'm currently only addressing what it takes to be
> an NDArray/NumArray object.  OTOH, if I can present something to the
> UFuncs that looks like a known array type, why wouldn't UFuncs
> want to work with it?
>
If you are presenting numarray with a type is already knows about,
why aren't you subclassing it? If you present numarray an object
with a type it doesn't know about, then that is pointless.
Types and numarray are inextricably intertwined, and shall
remain so.
>
> - Allows me personally to distribute a separate (and simpler)
> implementation of NDArrays/NumArrays right now and have the same data
> objects work with yours when you're all done.  If I give the UFuncs a
> pointer to memory, and the attributes above, why shouldn't it work
> correctly?
>
>
> Am I doing any better?  I am trying.
>
Not really. More on that later.
>
>
> Is there a way, today, without modifying numarray, for me to use
> numarray as a holder for these esoteric data types?  Is that way
> difficult?
>  Could it be easier?
>
No to the first, it isn't intended to serve that purpose. If
you just need something to blindly hold values without doing
anything with them use NDArray (and you can add whatever customization
you wish regarding what methods or operators are available).

> I'm not asking numarray to know about my types in it's core baseline.  I'm
> wondering what it takes to implement new types at all.
>
It's possible to extend (but not in any way that makes it
automaticaly usable with anyone elses extension. Currently
that sort of extension would not be hard for someone that
knows how things work. We haven't documented how to do so,
and won't for a while. It's not a high priority for us now.

**********************************************************

What I want to see is a specific example. I'm not going to
pay much attention to generalities becasue I'm still unclear
about how you intend to do what you say you will do. Perhaps
I'm slow, but I still don't get it.

On the one hand, you ask us to have numarray accept objects
with the same 'interface'. Well, if they are not of an existing
supported type, thats pointless since numarray won't work
properly with them. If it is an existing type, you haven't
explained why you can't use numarray directly (or alternatively,
create a numarray object that uses the same buffer yours does).
I still haven't seen a specific example that illustrates why
you cannot use subclassing or an instance of a numarray object
instead. If you need to add a new type that's possible but
you'll have to spend some time figuring out how to do that for
your own extended version. If you just want to use arrays
to hold values (of new types), then use NDArray. It doesn't
care about types. But please give a specific case. E.g., "I want
complex ints and I will develop a class that will use this to
do the following things [it doesn't have to be exhastive or
complete, but include just enough to illustrate the point].
If the attributes were standardized then I would do this and that,
and use it with your stuff like this showing you the code
(and the behavior I expect)."

Given this I can either show you an alternate solution or
I can realize why you are right and we can discuss where
to go from there. Otherwise you are wasting your time.

Perry


From xscottg at yahoo.com  Sun Apr 14 21:10:12 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Sun Apr 14 21:10:12 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <NEBBIJKBMLDBLNCEEFOCIEFICNAA.perry@stsci.edu>
Message-ID: <20020415040923.5808.qmail@web12903.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:

*** Just skim through my first few responses.  About half way through
writing this letter, a few things hit me.  I still want to propose some
changes, but I don't think you'll find them as intrusive...


>
> >
> > Then anyone who implemented these could work with the same C API for
> > getting the pointer to memory, shape array, stride array, and item
> > size.
> >
> Then you are talking about standardizing a C-API. But I'm still
> confused. If you write a class that implements these attributes,
> is it your C-API that uses them, or do you mean our C-API uses
> them?
>

I'm not really talking about standardizing a C-API.  I'm talking about
standardizing what that C-API would have to do.  You would have your 
C-API as part of numarray proper.  And, for the short term, I would have
my own C-API as part of what I need to get done.

Both C-API's would use the same attributes.

Why do I want my own C-API today?  Because numarray isn't done yet, and
I can't create arrays of the types I need.  I'll need a C-API to get at
my types.  It would be great if the same C-API could get at yours too.


>
> If you have your own C-API, then the attributes are not
> relevant as an interface. If you intend to use our C-API to access
> your objects, then they are. 
>

Either C-API could access anything that looks like an NDArray.


>
> >
> > Because truthfully arrays are little more than a pointer to memory.
> >
> > That's like asking "why in the world would we presume memcpy() or
> > qsort() would know what to do with your memory?"
> >
>
> Then you misunderstand Numarray. Numarrays are far more than just
> a pointer to memory. You can get a pointer to memory from them,
> but they entail much more than that. Numarray presumes that certain
> things are possible with NumArray objects (like standard math
> operations). If you want something that doesn't make such an
> assumption, you should be using NDArray instead. NDArray makes
> no presumptions about the contents of the memory other than
> they are arranged in memory in array fashion.
>

I think I understand where you're coming from now.  

(BTW, I think some of our confusion comes from when I'm talking about
"Numarray" or "numarray" the package versus "NumArray" and 
"NDArray" the classes.)


*** Ok, I think there is light at the end of this tunnel...

I guess what I've been arguing for all along is something a lot like
an NDArray where I can specify the typecode (and possibly other things like
'endian' etc...), and that only NDArrays have a minimal set of standardized
attributes.


With this I can create extensions that will work with anything that
looks like an NDArray.  Your NDArrays from the numarray package, and
my NDArrays of crazy types.


I'm still left in the position of having to upcast an NDArray to a
full blown NumArray if I ever want to use my NDArrays in a routine
meant solely for NumArrays.  However this conversion isn't difficult,
and I think can do that when needed.


Important Question:  If an NDArray had a typecode (and it was a known
string), is it possible to promote it to one of the standard NumArray
types?

Lesser Question:  If an NDArray had a known typecode, is it desirable
for numarray routines to promote the NDArray to a NumArray in the same
way that the routines promote a Python list or tuple to a NumArray on
the fly?


Ok, my new proposal (again, treat it like a suggestion):

- Do you think it would be possible to standardize the set of attributes
that it requires to be an NDArray?  NDArrays are simple and unlikely to
change.  I think _those_ really are just pointers to memory with array
accounting information.  We could agree on what exactly constitutes an
NDArray.

- Could this standard set of attributes optionally include the names for
the typecode, endian, (and maybe some other) attributes?


That doesn't mean that your NDArrays would have to have the typecode,
endian or whatever information.  It just means that when any class does
add a typecode, it adds it as a specially named attribute.


I realize that a large part of what I want is interoperability between
separate implementations of NDArrays.


Anything that has (_data, _shape, _itemsize, _type) is something I could
work with in an extension.  Some other fields are optional (_strides,
_byteoffset) because they have sensible defaults that can be calculated
from above in the common case.

So the only difference between what you currently have and most of what
I'm proposing is that the names of NDArray attributes become standardized.


>
> If you are presenting numarray with a type it already knows about,
> why aren't you subclassing it?
>

Since I know I'll have to create types that numarray doesn't know
about, I know I'm going to have to write a new array class (it's
already written).

It would be silly of my new array class to not implement the standard
types just because numarray _does_ know about them.

I now realize that I don't have to give my class to numarray directly. 
That didn't hit me before.  I could promote/upcast it when necessary.
The upcast-in and downcast-out thing will add up to extra work and
messier code, but it is a workaround.


>
> If you present numarray an object
> with a type it doesn't know about, then that is pointless.
> Types and numarray are inextricably intertwined, and shall
> remain so.
>

Understood.  I don't want to ruin your NumArrays.


> 
> **********************************************************
> 
> What I want to see is a specific example. I'm not going to
> pay much attention to generalities because I'm still unclear
> about how you intend to do what you say you will do. Perhaps
> I'm slow, but I still don't get it.
> 

Nope, clearly it was me that was being slow.  

There is still that bit about NDArrays that I'm trying to justify, so my
example is below.


>
> (or alternatively,
> create a numarray object that uses the same buffer yours does).
>

You're right.  This hadn't occurred to me until just a little bit ago.


>
> E.g., "I want
> complex ints and I will develop a class that will use this to
> do the following things [it doesn't have to be exhaustive or
> complete, but include just enough to illustrate the point].
> If the attributes were standardized then I would do this and that,
> and use it with your stuff like this showing you the code
> (and the behavior I expect)."
>

Here goes (somewhat hypothetical, but close to the boat I'm currently in):

Jon is our FPGA guy who makes screaming fast core files, but our FPGAs
don't do floating point.  So I have to provide his driver with ComplexInt16
data.

Jon and I write an extension module that calls his driver and reads data. 
We also write a C routine (call it "munge") that takes both ComplexInt16
data, and ComplexFloat64 data.  We try it out for testing, and pass in my
arrays in both places.  We could have used Numarray for the ComplexFloat64,
but that meant we had to use two array packages, and use two C-APIs in our
extension.  All we needed was a pointer to an array of doubles, so we stuck
with mine.

Ok, that part of development is done.  Now we present it to the application
developers.  Their happy and we're rolling.  Successful application.

Another group find out about this and they want to use it.  They're using
numarray for a large part of their application.  In fact, their calculating
the ComplexFloat64 half the data that they want to pass to my "munge"
routine using numarray, and they still need to use my ComplexInt32 data to
read the FPGA.

They're going to be disappointed to find out my extension can't read
numarray data, and that they have to convert back and forth between the
two.  And as the list of routines grow, they have to keep track of whether
it is a numarray-routine, or a scottarray-routine.

It's not so bad for one simple "munge" function, but there are going to be
hundreds of functions...

I don't expect you to have much sympathy for my having to convert data back
and forth between my array types and yours, but it is an avoidable problem.


For the most part, we both agree on what parts an NDArray should have.  If
we could only agree what to name them, and that we'd stick to those names,
that would be a large part of it for me.


>
> Given this I can either show you an alternate solution or
> I can realize why you are right and we can discuss where
> to go from there. Otherwise you are wasting your time.
>


Cheers,
    -Scott


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From jmiller at stsci.edu  Mon Apr 15 11:19:09 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Mon Apr 15 11:19:09 2002
Subject: [Numpy-discussion] ANN: Numarray-0.3.1 and 0.3.2
Message-ID: <3CBB1955.1010800@stsci.edu>

Numarray 0.3.1 and 0.3.2
---------------------------------
Numarray is an array processing package designed to efficiently
manipulate large multi-dimensional arrays.  Numarray is modelled after
Numeric and features c-code generated from python template scripts,
the capacity to operate directly on arrays in files, and improved type
promotions.

Numarray-0.3.1 incorporates a number of bug fixes and enhancements to
the C-API, including a minimal Numeric emulation layer which makes it
easy to port simple Numeric C-extensions to numarray.  The emulation
layer is incomplete, so not all Numeric extensions will work, but
simple ones *do* with a minimal amount of effort.  See
Doc/numpy_compat for an example of convolution done using the
emulation layer.  New for Numarray-0.3.1 is the Numarray manual in PDF
and HTML formats;  other formats are available for users if the source
distribution.

Numarray-0.3.2 is a source only release to support Alpha/Tru64.  It is
essentially Numarray-0.3.1 + one portability bug fix.

WHERE
-----------
Numarray-0.3.1 windows executable installers and source code tar ball is 
here:

http://sourceforge.net/project/showfiles.php?group_id=1369

Numarray is hosted by Source Forge in the same project which hosts Numeric:

http://sourceforge.net/projects/numpy/

The web page for Numarray information is at:

http://stsdas.stsci.edu/numarray/index.html

Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at
the Source Forge project for NumPy at:

http://sourceforge.net/tracker/?group_id=1369

REQUIREMENTS
--------------------------

numarray-0.3.1 requires Python 2.0 or greater.


AUTHORS, LICENSE
------------------------------

Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC
Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science
Institute.  Thanks go to Jochen Kupper of the University of North
Carolina for his work on Numarray and for porting the Numarray manual
to TeX format.

Numarray is made available under a BSD-style License.  See
LICENSE.txt in the source distribution for details.

-- 
Todd Miller 			jmiller at stsci.edu 


From perry at stsci.edu  Mon Apr 15 14:20:01 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Mon Apr 15 14:20:01 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020415040923.5808.qmail@web12903.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFAEJGDNAA.perry@stsci.edu>

Hi Scott,

I'm not going to respond to all points but mainly concentrate on the
last section.
>

>
> Important Question:  If an NDArray had a typecode (and it was a known
> string), is it possible to promote it to one of the standard NumArray
> types?
>
I think we want to avoid NDArray having any type attribute (Some types
have subtypes and then the issue gets really messy). We leave it
to the subclass to address how types will be handled.

> Here goes (somewhat hypothetical, but close to the boat I'm currently in):
>
> Jon is our FPGA guy who makes screaming fast core files, but our FPGAs
> don't do floating point.  So I have to provide his driver with
> ComplexInt16
> data.
>
> Jon and I write an extension module that calls his driver and reads data.
> We also write a C routine (call it "munge") that takes both ComplexInt16
> data, and ComplexFloat64 data.  We try it out for testing, and pass in my
> arrays in both places.  We could have used Numarray for the
> ComplexFloat64,
> but that meant we had to use two array packages, and use two C-APIs in our
> extension.  All we needed was a pointer to an array of doubles,
> so we stuck
> with mine.
>
> Ok, that part of development is done.  Now we present it to the
> application
> developers.  Their happy and we're rolling.  Successful application.
>
> Another group find out about this and they want to use it.  They're using
> numarray for a large part of their application.  In fact, their
> calculating
> the ComplexFloat64 half the data that they want to pass to my "munge"
> routine using numarray, and they still need to use my ComplexInt32 data to
> read the FPGA.
>
> They're going to be disappointed to find out my extension can't read
> numarray data, and that they have to convert back and forth between the
> two.  And as the list of routines grow, they have to keep track of whether
> it is a numarray-routine, or a scottarray-routine.
>
> It's not so bad for one simple "munge" function, but there are going to be
> hundreds of functions...
>
> I don't expect you to have much sympathy for my having to convert
> data back
> and forth between my array types and yours, but it is an
> avoidable problem.
>
>
>
> For the most part, we both agree on what parts an NDArray should have.  If
> we could only agree what to name them, and that we'd stick to those names,
> that would be a large part of it for me.
>
>
I'm not sure I understand the problem in all the details I need to.
I'll restate it as best as I understand it and you can tell me if
I understood incorrectly.

You have extension modules that get complex int data from hardware.
Other processing may be done to the complex int data in that format
so it doesn't make sense to convert it to a more standard format when
reading it in. You have C extensions that carry out certain tasks
on complex data (in either complex int format or complex floats).
You have users that would like to use your routine with numarray.
(I haven't seen any specific mention of the need for ufuncs on
complex ints so I'll assume you just need complex int arrays as
containers for C programs to use.)

[If you did need to perform ufuncs on complex ints, then extending
numarray locally to handle them would be one possibility, but a little
involved at the moment (a little easier later when we reimplement
complex), then again, maybe not, the complex stuff is currently
subclassed from numarray and not that hard to adapt to ints I think,
but it isn't that well done now].

I guess my initial reaction is that you should develop a front-
end C-API that handles obtaining data buffers from different
sources.  You get to define what kinds of things it supports,
and changes to either the list of types you support and localizes
any dependencies on our or anyone else's api to a small section of
code. From what I'm hearing, you don't need it to provide much
(pointer to arrays and associated information). If we are real
bozos and change the interface, it doesn't hurt you much (not that
we intend to be bozos or change the C-API willy nilly :-)

To elaborate, you define your equivalent of our getNumInfo routine

I don't think I've seen anything that requires explicit dependencies
on Python attributes. Sure, you could use the same attribute names
and use Python calls to get those just as our getNumInfo routine does,
but I think that is bad practice. You may find some other representation
for arrays out there that doesn't fit this model and you may want to
work with those also and you won't be able to get them to adopt our
scheme.

You say that you don't want your users to have to convert between
the two data representations. If they are using your C extensions
that is understandable, and avoidable since you've written your
programs to deal with the various types. On the other hand,
unless you extend numarray, numarray clearly cannot deal with
the complex ints so conversion is necessary. But understandably,
you would like to eliminate the need for explicit conversions.
I think there is an easy way of dealing with this.

We haven't implemented this capability yet but we've been talking
about having numarray check input values to see if they have
a method "tonumarray" [not that we would choose that particular
method name, I'm just illustrating the point]. If that
method did exist, it would be called to create a numarray
from the object. Thus you could add such a method to your
class and when it is used in numarray ufuncs or in binary operations
with numarray objects, your complex ints are automatically
converted to numarray objects (presumably a complex float of
some precision). Adding this capability to numarray should be
pretty easy.

True, the solution that I proposed doesn't protect you from making
any changes ever. But we believe we are at a stage in the project
where it is dangerous to lock ourselves into lower level details
such as the internal description of the array. We still have things
to implement and that may cause us to realize that some changes
are needed. Our C-API stuff is relatively new. It may see changes
in the near future, but likely not many related to what you need.
And we intend to shield the C-API from changes in the Python
attributes. We could change the name or contents of _byteswap and
it would not change anything in the C-API. I see premature
coupling of low level implementation details as a bad thing,
not a good thing. Any change that are made to the API require
changes only the corresponding routine in your C-API, and all
your C applications are shielded from any changes (save rebuilding).

If I've misunderstood your examples, please let me know.

Perry


From xscottg at yahoo.com  Mon Apr 15 15:33:10 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Mon Apr 15 15:33:10 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFAEJGDNAA.perry@stsci.edu>
Message-ID: <20020415223223.5901.qmail@web12905.mail.yahoo.com>

Hi Perry.

Well, I don't think I've made any progress convincing you that
standardizing what it means to be an interoperable "NDArray"
would be good for me or others in the community, but I do
appreciate you letting me try.


I'll take your suggestion and make my C-API understand a superset
of array types.  I'll wait to see how the tonumarray() thing pans
out.  That might meet all of my practical concerns even if I don't
think it is as elegant of a solution as defining a strong interface.


I'll just respond to the one point below.  If I had to sum up my
argument for why I think separate array implementations could 
(should) be compatible, it is buried in the answer to this question.


>
> >
> > Important Question:  If an NDArray had a typecode (and it was a known
> > string), is it possible to promote it to one of the standard NumArray
> > types?
> >
>
> I think we want to avoid NDArray having any type attribute (Some types
> have subtypes and then the issue gets really messy). We leave it
> to the subclass to address how types will be handled.
> 

Ok that's what you're currently doing, but let me rephrase the question.

  :-)


Given a "leaf type" -- something that is really well specified and very
similar on all modern platforms:

    "Int32"    - not just an arbitrary "Int"
    "Float64"  - not just an arbitrary "Float")


Do you think you could write a general purpose _function_ that converted an
"NDArray" to a full featured "NumArray"?  I know this would be in Python,
but let's pretend it's a C++ prototype to make the types clear:


NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) {
    if (WellKnownLeafTypecodeString(typecode)) {

        /* fill in the blanks here */

        return NumArray(result)
    }

    throw "conversion really is impossible";
}


Cheers and thanks again for your time,
    -Scott Gilbert


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Tue Apr 16 08:15:09 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Tue Apr 16 08:15:09 2002
Subject: [Numpy-discussion] Introduction
In-Reply-To: <20020415223223.5901.qmail@web12905.mail.yahoo.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFAEJKDNAA.perry@stsci.edu>

> > > Important Question:  If an NDArray had a typecode (and it was a known
> > > string), is it possible to promote it to one of the standard NumArray
> > > types?
> > >
> >
> > I think we want to avoid NDArray having any type attribute (Some types
> > have subtypes and then the issue gets really messy). We leave it
> > to the subclass to address how types will be handled.
> > 
> 
> Ok that's what you're currently doing, but let me rephrase the question.
> 
>   :-)
> 
> 
> Given a "leaf type" -- something that is really well specified and very
> similar on all modern platforms:
> 
>     "Int32"    - not just an arbitrary "Int"
>     "Float64"  - not just an arbitrary "Float")
> 
> 
> Do you think you could write a general purpose _function_ that 
> converted an
> "NDArray" to a full featured "NumArray"?  I know this would be in Python,
> but let's pretend it's a C++ prototype to make the types clear:
> 
> 
> NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) {
>     if (WellKnownLeafTypecodeString(typecode)) {
> 
>         /* fill in the blanks here */
> 
>         return NumArray(result)
>     }
> 
>     throw "conversion really is impossible";
> }
> 

I'm not sure I understand exactly what you are trying to do here, but
I try to address the question as best I can.

If one had an NDArray that happened to contain a type that numarray 
supported, yes it is possible (in fact RecArray does that sort of thing).

If your point is that in doing so one must use the private attributes
such as _strides, yes that is true. These attributes are private in 
the sense that users of instances of these objects should never have
cause to access them. But it does not mean that classes that subclass
NDArray or any of its subclasses, should not access them. They are not
private in the sense of the class family (one reason we didn't use
__strides since that mechanism is  not usable (easily anyway) for 
subclasses. In that sense, the attributes form an interface within 
the class family. Some class extenders may need to access them, sure.

Perry 


From omar.mekkaoui at eco.u-cergy.fr  Tue Apr 16 11:04:38 2002
From: omar.mekkaoui at eco.u-cergy.fr (mekkaoui)
Date: Tue Apr 16 11:04:38 2002
Subject: [Numpy-discussion] Extension under windows
Message-ID: <3CBC68D9.64B7C425@eco.u-cergy.fr>

Dear Numerical Python Users,

I have writen an extension using  GSL (Gnu Scientific Library) and
Numerical Python.
This extension work fine under Linux and I would to do the same under
Windows. For that I use Cygwin.
When I would create the module

$ gcc -shared Example.o -o Example.pyd

I receive this message :


Example.o<.text+0x58>:Example.c: undefined reference to
'PyArg_ParseTuple'
Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue'

Example.o<.text+0x1b1>:Example.c: undefined reference to
'Py_InitModule4'
Example.o<.text+0x1c1>:Example.c: undefined reference to
'PyImport_ImportModule'
Example.o<.text+0x1db>:Example.c: undefined reference to
'PyModule_GetDict'
Example.o<.text+0x1f4>:Example.c: undefined reference to
'PyDict_GetItemString'
Example.o<.text+0x206>:Example.c: undefined reference to
'PyCObject_Type'
Example.o<.text+0x214>:Example.c: undefined reference to
'PyCObject_AsVoidPtr'

Perhaps this command is wrong.

Perhaps, anyone could explain or show me a document which explain the
procedure clearly ?

Thanks in advance for your help

Omar


From xscottg at yahoo.com  Tue Apr 16 16:38:02 2002
From: xscottg at yahoo.com (Scott Gilbert)
Date: Tue Apr 16 16:38:02 2002
Subject: [Numpy-discussion] Introduction
Message-ID: <20020416233700.72472.qmail@web12904.mail.yahoo.com>

--- Perry Greenfield <perry at stsci.edu> wrote:
> 
> If one had an NDArray that happened to contain a type that numarray 
> supported, yes it is possible (in fact RecArray does that sort of thing).
> 
> If your point is that in doing so one must use the private attributes
> such as _strides, yes that is true.
>

My point was simply:

  = One *can* convert from (NDArray + typecode) to a full NumArray
  = You *do* already convert lists, tuples, ... to NumArrays in ufuncs
  = So you *could* convert *(NDArrays + typecode) to NumArrays in ufuncs
    in the same place that checks to see if it is a list, tuple, ...

Therefore:

  = You possibly *could* standardize the attributes in an NDArray
      (buffer, typecode, shape, stride, offset, ...)
  = If you *did* standardize the attributes, then others *could*
    build UserDefinedNDArrays however they see fit and they would
    work with NumArrays


However I get the sense that the numarray module is your baby, and you
don't want to change him too much.  That's very understandable, you're a
proud parent.  Truth be told, he's a good looking kid, and I look forward
to hanging out with him when he's all grown up.  We just have a little
different view on parenting, and I was hoping my kid would have an easier
time playing with yours.


Now that I've beaten that silly metaphor to death...  :-)


Cheers,
    -Scott


ps: It occurs to me, with the strong sense of encapsulation you desire,
that I could have presented this better as requesting that you specify a
set of standard *methods* instead of attributes.  Something like:

     def __array_getbuffer__(self):
     def __array_getoffset__(self):
     def __array_getshape__(self):
     def __array_getstrides__(self):
     def __array_getitemsize__(self):
     def __array_gettypecode__(self):
     def __array_getendian__(self):
     # Who knows what the real list would consist of...
     # We never got to discuss what a really general
     # purpose description of an NDArray would require...


Then anything which implemented those standard *methods* would be a viable
NDArray.  From my point of view it amounts to about the same thing, but I
think it's a better design and that you might like this idea more.


However I'm getting out of breath on this topic, and I have other things
I need to do (I'm sure this is true for you too), so if you don't see any
merit in this idea, I won't push for it any further.

Cheers again.


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/


From perry at stsci.edu  Tue Apr 16 17:52:03 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Tue Apr 16 17:52:03 2002
Subject: [Numpy-discussion] Conclusion
In-Reply-To: <20020416233700.72472.qmail@web12904.mail.yahoo.com>
Message-ID: <NEBBIJKBMLDBLNCEEFOCCEFLCNAA.perry@stsci.edu>

After Scott's last display of his powers of persuasion,
I lack for a meaningful response. It seems appropriate to
declare this thread closed.

Besides, I've got to go change some diapers ;-) 

Perry


From paul at pfdubois.com  Wed Apr 17 07:17:07 2002
From: paul at pfdubois.com (Paul F Dubois)
Date: Wed Apr 17 07:17:07 2002
Subject: [Numpy-discussion] Extension under windows
In-Reply-To: <3CBC68D9.64B7C425@eco.u-cergy.fr>
Message-ID: <000301c1e61a$20c2a590$0a01a8c0@NICKLEBY>

You need to link with the Python library. I suggest you learn to use
distutils and then it will load for you correctly on both platforms. The
file "setup.py" in the Numeric source distribution is a good if
complicated example. Some of the setup.py files in the Packages area are
simpler and easier to understand.

-----Original Message-----
From: numpy-discussion-admin at lists.sourceforge.net
[mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of
mekkaoui
Sent: Tuesday, April 16, 2002 11:09 AM
To: numpy-discussion at lists.sourceforge.net
Subject: [Numpy-discussion] Extension under windows


Dear Numerical Python Users,

I have writen an extension using  GSL (Gnu Scientific Library) and
Numerical Python. This extension work fine under Linux and I would to do
the same under Windows. For that I use Cygwin. When I would create the
module

$ gcc -shared Example.o -o Example.pyd

I receive this message :


Example.o<.text+0x58>:Example.c: undefined reference to
'PyArg_ParseTuple'
Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue'

Example.o<.text+0x1b1>:Example.c: undefined reference to
'Py_InitModule4'
Example.o<.text+0x1c1>:Example.c: undefined reference to
'PyImport_ImportModule'
Example.o<.text+0x1db>:Example.c: undefined reference to
'PyModule_GetDict'
Example.o<.text+0x1f4>:Example.c: undefined reference to
'PyDict_GetItemString'
Example.o<.text+0x206>:Example.c: undefined reference to
'PyCObject_Type'
Example.o<.text+0x214>:Example.c: undefined reference to
'PyCObject_AsVoidPtr'

Perhaps this command is wrong.

Perhaps, anyone could explain or show me a document which explain the
procedure clearly ?

Thanks in advance for your help

Omar


_______________________________________________
Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


From magnus at hetland.org  Wed Apr 17 07:32:31 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Wed Apr 17 07:32:31 2002
Subject: [Numpy-discussion] Graphs in numarray?
Message-ID: <20020417163133.F7565@idi.ntnu.no>

I'm looking at various ways of implementing graphs in Python (beyond
simple dict-based stuff -- more performance is needed). kjbuckets
looks like a nice alternative, as does the Boost Graph Library (not
sure how easy it is to use with Boost.Python) but if numarray is to
become a part of the standard library, it could be beneficial to use
that...

For dense graphs, it makes sense to use an adjacency matrix directly
in numarray, I should think. (I haven't implemented many graph
algorithms with ufuncs yet, but it seems doable...) For sparse graphs
I guess some sort of sparse array implementation would be useful,
although the archives indicate that creating such a thing isn't a core
part of the numarray project.

What do you think -- is it reasonable to use numarray for graph
algorithms? Perhaps an additional module with standard graph
algorithms would be interesting? (I'm sure I could contribute some if
there is any interest...)

And -- is there any chance of getting sparse matrices in numarray?

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From perry at stsci.edu  Wed Apr 17 12:10:32 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Wed Apr 17 12:10:32 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020417163133.F7565@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFOEKGDNAA.perry@stsci.edu>

Hi Magnus,

On Behalf Of Magnus Lie Hetland

> 
> I'm looking at various ways of implementing graphs in Python (beyond
> simple dict-based stuff -- more performance is needed). kjbuckets
> looks like a nice alternative, as does the Boost Graph Library (not
> sure how easy it is to use with Boost.Python) but if numarray is to
> become a part of the standard library, it could be beneficial to use
> that...
> 
> For dense graphs, it makes sense to use an adjacency matrix directly
> in numarray, I should think. (I haven't implemented many graph
> algorithms with ufuncs yet, but it seems doable...) For sparse graphs
> I guess some sort of sparse array implementation would be useful,
> although the archives indicate that creating such a thing isn't a core
> part of the numarray project.
> 
First of all, it may make sense, but I should say a few words about
what scale sizes make sense. Currently numarray is implemented mostly
in Python (excepting the very low level, very simple C functions
that do the computational and indexing loops. This means it currently
has a pretty sizable overhead to set up an array operation (I'm
guessing an order of magnitude slower than Numeric). Once set up,
it generally is pretty fast. So it is pretty good for very large
data sets. Very lousy for very small ones. We haven't measured
efficiency lately (we are deferring optimization until we have all
the major functionality present first), but I wouldn't be at all
surprised to find that the set up time can be equal to the time
to actually process ~10,000-20,000 elements (i.e., the time spent per
element for a 10K array is roughly half that for much larger arrays.

So if you are working with much smaller arrays than 10K, you won't
see total execution time decrease much (it was already spending 
half its time in setup, which doesn't change). We would like
to reduce this size threshhold in the future, either by optimizing the
Python code, or moving some of it into C. This optimization wouldn't
be for at least a couple more months; we have more urgent features
to deal with. I doubt that we will ever surpass the current Numeric
in its performance on small arrays (though who knows, perhaps we 
can come close).

> What do you think -- is it reasonable to use numarray for graph
> algorithms? Perhaps an additional module with standard graph
> algorithms would be interesting? (I'm sure I could contribute some if
> there is any interest...)
> 
Before I go further, I need to find out if the preceeding has made
you gasp in horror or if the timescale is too slow for you to
accept. (This particular issue also makes me wonder if numarray would
ever be a suitable substitute for the existing array module).
What size graphs are you most concerned about as far as speed goes?

> And -- is there any chance of getting sparse matrices in numarray?
> 
Since talk is cheap, yes :-). But I doubt it would be in the "core"
and some thought would have to be given to how best to represent them.
In one sense, since the underlying storage is different than numarray
assumes for all its arrays, sparse arrays don't really share the
same underlying C machinery very well. While it certainly would be
possible to devise a class with the same interface as numarray objects,
the implementation may have to be completely different. 

On the other hand, since numarray has much better support for index
arrays, i.e., an array of indices that may be used to index another
array of values,  index array(s), value array pair may itself serve
as a storage model for sparse arrays. One still needs to implement
ufuncs and other functions (including simple things like indexing)
using different machinery. It is something that would be nice to have,
but I can't say when we would get around to it and don't want to
raise hopes about how quickly it would appear.

Perry


From victor at idaccr.org  Wed Apr 17 15:25:24 2002
From: victor at idaccr.org (Victor S. Miller)
Date: Wed Apr 17 15:25:24 2002
Subject: [Numpy-discussion] The right way to use results of argmax and argmin
Message-ID: <ulwuv63qwd.fsf@runner.princeton.idaccr.org>

# I'm running python 2.0 on Solaris and Numeric 21.0
#I have an m by n array -- called a and have
# j an n long list of integers in range(m), such as

j = argmax(a,0)

# If I set
z = zip(j,range(len(j)))

# and try the statement

res = take(a,z)

# python appears to hang, but if I do

res = array(map(lambda x,a=a: a[x[0],x[1]]],z)

# It works.

# Is there a simpler way of doing what I want, and why does take hang?
# is it, perhaps, allocating some n by n work array (this would
# probably make things thrash like crazy)?


-- 
Victor S. Miller     | " ... Meanwhile, those of us who can compute can hardly
victor at idaccr.org    | be expected to keep writing papers saying 'I can do the
CCR, Princeton, NJ   | following useless calculation in 2 seconds', and indeed
    08540 USA        | what editor would publish them?"  -- Oliver Atkin


From magnus at hetland.org  Thu Apr 18 07:55:19 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 18 07:55:19 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFOEKGDNAA.perry@stsci.edu>; from perry@stsci.edu on Wed, Apr 17, 2002 at 03:06:12PM -0400
References: <20020417163133.F7565@idi.ntnu.no> <JFEGLNDJEDNOMPPHDEJFOEKGDNAA.perry@stsci.edu>
Message-ID: <20020418165403.E300@idi.ntnu.no>

Perry Greenfield <perry at stsci.edu>:
[snip]
> First of all, it may make sense, but I should say a few words about
> what scale sizes make sense.
[snip]
> So if you are working with much smaller arrays than 10K, you won't
> see total execution time decrease much

In relation to what? Using dictionaries etc? Using the array module?
[snip]
> Before I go further, I need to find out if the preceeding has made
> you gasp in horror or if the timescale is too slow for you to
> accept.

Hm. If you need 10000 elements before numarray pays off, I'm starting
to wonder if I can use it for anything at all. :I

> (This particular issue also makes me wonder if numarray would
> ever be a suitable substitute for the existing array module).

Indeed.

> What size graphs are you most concerned about as far as speed goes?

I'm not sure. A wide range, I should imagine. But with only 100 nodes,
I'll get 10000 entries in the adjacency matrix, so perhaps it's
worthwile anyway?

> > And -- is there any chance of getting sparse matrices in numarray?
>
> Since talk is cheap, yes :-). But I doubt it would be in the "core"
> and some thought would have to be given to how best to represent them.
> In one sense, since the underlying storage is different than numarray
> assumes for all its arrays, sparse arrays don't really share the
> same underlying C machinery very well. While it certainly would be
> possible to devise a class with the same interface as numarray objects,
> the implementation may have to be completely different. 

Yes, I realise that.

> On the other hand, since numarray has much better support for index
> arrays, i.e., an array of indices that may be used to index another
> array of values,  index array(s), value array pair may itself serve
> as a storage model for sparse arrays.

That's an interesting idea, although I don't quite see how it would
help in the case of adjacency matrices. (You'd still need at least one
n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
Right?)

> One still needs to implement ufuncs and other functions (including
> simple things like indexing) using different machinery. It is
> something that would be nice to have, but I can't say when we would
> get around to it and don't want to raise hopes about how quickly it
> would appear.

No - no problem.

Basically, I'm looking for a platform to implement graph algorithms
that doesn't necessitate too many installed packages etc. numarray
seemed promising since it's a candidate for inclusion in the standard
library. I guess I'll just have to do some timing experiments...

> Perry

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From perry at stsci.edu  Thu Apr 18 08:22:06 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 18 08:22:06 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020418165403.E300@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu>


> Behalf Of Magnus Lie Hetland
> Perry Greenfield <perry at stsci.edu>:
> [snip]
> > First of all, it may make sense, but I should say a few words about
> > what scale sizes make sense.
> [snip]
> > So if you are working with much smaller arrays than 10K, you won't
> > see total execution time decrease much
> 
> In relation to what? Using dictionaries etc? Using the array module?

No, in relation to operations on a 10K array. Basically, if an operation
on a 10K array spends half its time on set up, operations on a
10 element array may only be twice as fast. I'm not making any claims
about speed in relation to any other data structure (other than Numeric)

> [snip]
> > Before I go further, I need to find out if the preceeding has made
> > you gasp in horror or if the timescale is too slow for you to
> > accept.
> 
> Hm. If you need 10000 elements before numarray pays off, I'm starting
> to wonder if I can use it for anything at all. :I
> 
I didn't make clear that this threshold may improve in the future
(decrease). The corresponding threshold for Numeric is probably 
around 1000 to 2000 elements. (Likewise, operations on 10 element
Numeric arrays are only about twice as fast as for 1K arrays)
We may be able to eventually improve numarray performance to something 
in that neighborhood (if we are luckly) but I would be surprised to
do much better (though if we use caching techniques, perhaps repeated
cases of arrays of identical shape, strides, type, etc. may run
much faster on subsequent operations). As usual, performance issues
can be complicated. You have to keep in mind that Numeric and numarray
provide much richer indexing and conversion handling feature than 
something like the array module, and that comes at some price in
performance for small arrays.

> > (This particular issue also makes me wonder if numarray would
> > ever be a suitable substitute for the existing array module).
> 
> Indeed.
> 
> > What size graphs are you most concerned about as far as speed goes?
> 
> I'm not sure. A wide range, I should imagine. But with only 100 nodes,
> I'll get 10000 entries in the adjacency matrix, so perhaps it's
> worthwile anyway?
> 
That's right, a 100 nodes is where performance is being competitive,
and if you feel you are worried about cases larger than that, then
it isn't a problem. But if you are operating mostly on small graphs,
then it may not be appropriate. The corresponding threshold for numeric
would be on the order of 30 nodes.

> > On the other hand, since numarray has much better support for index
> > arrays, i.e., an array of indices that may be used to index another
> > array of values,  index array(s), value array pair may itself serve
> > as a storage model for sparse arrays.
> 
> That's an interesting idea, although I don't quite see how it would
> help in the case of adjacency matrices. (You'd still need at least one
> n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
> Right?)
> 
Right.

> 


From magnus at hetland.org  Thu Apr 18 08:48:17 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 18 08:48:17 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu>; from perry@stsci.edu on Thu, Apr 18, 2002 at 11:21:46AM -0400
References: <20020418165403.E300@idi.ntnu.no> <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu>
Message-ID: <20020418174733.A7072@idi.ntnu.no>

Perry Greenfield <perry at stsci.edu>:
>
[snip]
> > In relation to what? Using dictionaries etc? Using the array module?
> 
> No, in relation to operations on a 10K array. Basically, if an operation
> on a 10K array spends half its time on set up, operations on a
> 10 element array may only be twice as fast. I'm not making any claims
> about speed in relation to any other data structure (other than Numeric)
 
Aaah! Sorry to be so dense :)

But the speedup in numeric between different sizes isn't as important
to me as the speedup compared to other solutions (such as a dict-based
one) of course... If a 10 element array is only twice as fast as a 10K
array that's no problem if it's still faster than an alternative
solution (though I'm sure it might not be...)

The same goes for 10K element graphs -- the interesting point has to
be whether it's faster than various alternatives (which I'm sure it
is).

> > [snip]
> > > Before I go further, I need to find out if the preceeding has made
> > > you gasp in horror or if the timescale is too slow for you to
> > > accept.
> > 
> > Hm. If you need 10000 elements before numarray pays off, I'm starting
> > to wonder if I can use it for anything at all. :I
> > 
> I didn't make clear that this threshold may improve in the future
> (decrease).

Right. Good.

And -- on small graphs performance probably won't be much of a problem
anyway. :)

> The corresponding threshold for Numeric is probably 
> around 1000 to 2000 elements. (Likewise, operations on 10 element
> Numeric arrays are only about twice as fast as for 1K arrays)
> We may be able to eventually improve numarray performance to something 
> in that neighborhood (if we are luckly) but I would be surprised to
> do much better (though if we use caching techniques, perhaps repeated
> cases of arrays of identical shape, strides, type, etc. may run
> much faster on subsequent operations). As usual, performance issues
> can be complicated. You have to keep in mind that Numeric and numarray
> provide much richer indexing and conversion handling feature than 
> something like the array module, and that comes at some price in
> performance for small arrays.

Of course.

I guess an alternative (for the graph situation) could be to wrap the
graphs with a common interface with various implementations, so that a
solution more optimised for small graphs could be used (in a factory
function) if the graph is small... (Not really an issue for me at the
moment, but should be easy to do, I guess.)

[snip]
> > I'm not sure. A wide range, I should imagine. But with only 100 nodes,
> > I'll get 10000 entries in the adjacency matrix, so perhaps it's
> > worthwile anyway?
> > 
> That's right, a 100 nodes is where performance is being competitive,
> and if you feel you are worried about cases larger than that, then
> it isn't a problem.

Seems probable. For smaller problems I wouldn't be thinking in terms
of numarray anyway, I think. (Just using plain Python dicts or
something similar.)

[snip]
> > > On the other hand, since numarray has much better support for index
> > > arrays, i.e., an array of indices that may be used to index another
> > > array of values,  index array(s), value array pair may itself serve
> > > as a storage model for sparse arrays.
> > 
> > That's an interesting idea, although I don't quite see how it would
> > help in the case of adjacency matrices. (You'd still need at least one
> > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array...
> > Right?)
> > 
> Right.

I might as well use a full adjacency matrix, then...

So, the conclusion for now is that numarray may well be suited for
working with relatively large (100+ nodes), relatively dense graphs.

Now, the next interesting question is how much of the standard graph
algorithms can be implemented with ufuncs and array operations (which
I guess is the key to performance) and not straight for-loops... After
all, some of them are quite sequential.

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From rob at pythonemproject.com  Thu Apr 18 09:18:31 2002
From: rob at pythonemproject.com (rob)
Date: Thu Apr 18 09:18:31 2002
Subject: [Numpy-discussion] Graphs in numarray?
References: <20020418165403.E300@idi.ntnu.no> <JFEGLNDJEDNOMPPHDEJFOEKJDNAA.perry@stsci.edu> <20020418174733.A7072@idi.ntnu.no>
Message-ID: <3CBEF151.C440DCE@pythonemproject.com>

I'm sorry I missed the original post, but the topic is important for
me.  I use the lightweight 3d volume renderer Animabob for most
everything.  The interface code is in all of the FDTD programs in my
website.  You just unwind a 3d array and scale it to +/- 128, turn it
into chararacters, and you have the input file.  I wish Animabob could
somehow be turned into a Python package, as in Windows you need Cygwin
to run it.  I've tried other 3d packages like OpenDX, and they seem to
be huge albatrosses.


-- 
-----------------------------
The Numeric Python EM Project

www.pythonemproject.com


From perry at stsci.edu  Thu Apr 18 10:36:19 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 18 10:36:19 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <3CBEF151.C440DCE@pythonemproject.com>
Message-ID: <JFEGLNDJEDNOMPPHDEJFEEKMDNAA.perry@stsci.edu>


Behalf Of rob
 
> 
> I'm sorry I missed the original post, but the topic is important for
> me.  I use the lightweight 3d volume renderer Animabob for most
> everything.  The interface code is in all of the FDTD programs in my
> website.  You just unwind a 3d array and scale it to +/- 128, turn it
> into chararacters, and you have the input file.  I wish Animabob could
> somehow be turned into a Python package, as in Windows you need Cygwin
> to run it.  I've tried other 3d packages like OpenDX, and they seem to
> be huge albatrosses.
> 
It sound like you are trying to do something different than Magnus, but
if what you are looking to scale floating or int data to byte size and
apply some character mapping, numarray (or Numeric) should be able
to do that very well. If that is all you want done, you might find
either to be overkill though (if you already wrote a C extension to
do so).

Perry

 
From perry at stsci.edu  Thu Apr 18 10:39:03 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 18 10:39:03 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020418174733.A7072@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFIEKMDNAA.perry@stsci.edu>

 
> Now, the next interesting question is how much of the standard graph
> algorithms can be implemented with ufuncs and array operations (which
> I guess is the key to performance) and not straight for-loops... After
> all, some of them are quite sequential.
> 
I'm not sure about that (not being very familiar with graph algorithms).
If you can give me some examples (perhaps off the mailing list) I could
say whether they are easily cast into ufunc or library calls. 

Perry


From paul at pfdubois.com  Fri Apr 19 07:42:23 2002
From: paul at pfdubois.com (Paul F Dubois)
Date: Fri Apr 19 07:42:23 2002
Subject: [Numpy-discussion] [ANN] Pyfort 7.1
Message-ID: <000101c1e7ae$38ac0d50$0a01a8c0@NICKLEBY>

Pyfort 7.1 is available at sf.net/projects/pyfortran.

Support for single Fortran characters was added. (Michiel de Hoon)
Corrected behavior of scalars with C routines.   (Michiel de Hoon)

Pyfort is a tool for connecting Python to Fortran.

Just to let you know, I'm working on a little tool to make it easier to
set up simple projects so that you can build and install them with less
effort. I hope to have that available soon.


From rob at pythonemproject.com  Fri Apr 19 09:00:02 2002
From: rob at pythonemproject.com (rob)
Date: Fri Apr 19 09:00:02 2002
Subject: [Numpy-discussion] Icc compiled Python
Message-ID: <3CC03E5A.9835FC0A@pythonemproject.com>

There has been some discussion on the FreeBSD Ports list about an Icc
compiled Python.  Benchmarks much faster than the normal gcc compiled
version.  I'm wondering if anyone here knows anything about it.  The
discussion can be accessed via www.geocrawler.org/ FreeBSD/
freebsd-ports.

Rob.


-- 
-----------------------------
The Numeric Python EM Project

www.pythonemproject.com


From juenglin at informatik.uni-freiburg.de  Sat Apr 20 09:59:13 2002
From: juenglin at informatik.uni-freiburg.de (Ralf Juengling)
Date: Sat Apr 20 09:59:13 2002
Subject: [Numpy-discussion] NumPy initiated reference counting
Message-ID: <1019321875.8067.141.camel@leto>

I'm currently tinkering with the following problem and what like to
hear your suggestions:

Within a C module I define a new Python type 'IM' (representing an
image). 
The indexing or slicing facilities of NumPy arrays were tailormade 
for the manipulation of the internal data of its instances. Thus,
I could provide a method 'asarray', which creates a properly
typed array object 'a' referring to the data of an IM instance 'im':

a = im.asarray()

I could use PyArray_FromDimsAndData() to create the array instance.
Unfortunately, this wouldn't work, since 'a' would not get notified 
about the death of 'im'.
However, if I could prevent 'im' from being garbage collected before
all array instances referring to its data are deleted, it should work.

NumPy's array type uses a mechanism to prevent garbage collection
of array instances if there are other instances that share data with
it. My idea was, to use this mechanism, that is to let the asarray
method increment im's reference count and let a->base refer to im.

Do you think this is a reliable approach?

Thanks,
Ralf
  

-- 
--------------------------------------------------------------------------
Ralf J?ngling
Institut f?r Informatik - Lehrstuhl f?r Mustererkennung &
Bildverarbeitung
Georges-K?hler-Allee               
Geb?ude 52                                        Tel:
+49-(0)761-203-8215
79110 Freiburg                                    Fax:
+49-(0)761-203-8262
--------------------------------------------------------------------------


From juenglin at informatik.uni-freiburg.de  Sat Apr 20 12:22:51 2002
From: juenglin at informatik.uni-freiburg.de (Ralf Juengling)
Date: Sat Apr 20 12:22:51 2002
Subject: [Numpy-discussion] qs on NumPy
Message-ID: <1019330305.8067.158.camel@leto>

Hi,

I did not find a way in Python to check whether a Numeric array 
instance is a shared array or not. Could you confirm: there is no way.

Is there work underway to make Numeric arrays subclassable?

Regards,
Ralf

-- 
--------------------------------------------------------------------------
Ralf J?ngling
Institut f?r Informatik - Lehrstuhl f?r Mustererkennung &
Bildverarbeitung
Georges-K?hler-Allee               
Geb?ude 52                                        Tel:
+49-(0)761-203-8215
79110 Freiburg                                    Fax:
+49-(0)761-203-8262
--------------------------------------------------------------------------


From mok at imsb.au.dk  Tue Apr 23 04:21:03 2002
From: mok at imsb.au.dk (Morten Kjeldgaard)
Date: Tue Apr 23 04:21:03 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020417163133.F7565@idi.ntnu.no>
Message-ID: <Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>

> simple dict-based stuff -- more performance is needed). kjbuckets
> looks like a nice alternative, as does the Boost Graph Library (not

Kjbuckets is *very* nice indeed. It is a compact and very fast
implementation. I don't see why you'd want to wrap this functionality into
NumPy, which has a very well-defined scope and an efficient implentation.
It would be a shame to bloat it with something which is discretely 
different.

I have modified kjbuckets so that it compiles and works with Python 2.x. 
You can pick it up at 

ftp://xray.imsb.au.dk 
/pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm

Just do "rpm --rebuild" on it.

I sent the patch to the original author, but it appears he is no longer 
maintaining it. Never mind, it works great.

/Morten

-- 
Morten Kjeldgaard   <mok at imsb.au.dk>             | Phone : +45 89 42 50 26
Institute of Molecular and Structural Biology    | Fax   : +45 86 12 31 78
Aarhus University                                | Home  : +45 86 18 81 80
Gustav Wieds Vej 10 C, DK-8000 Aarhus C, Denmark | http://imsb.au.dk/~mok


From magnus at hetland.org  Thu Apr 25 07:28:05 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 25 07:28:05 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>; from mok@imsb.au.dk on Tue, Apr 23, 2002 at 01:20:04PM +0200
References: <20020417163133.F7565@idi.ntnu.no> <Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>
Message-ID: <20020425162734.B6821@idi.ntnu.no>

Morten Kjeldgaard <mok at imsb.au.dk>:
>
> 
> > simple dict-based stuff -- more performance is needed). kjbuckets
> > looks like a nice alternative, as does the Boost Graph Library (not
> 
> Kjbuckets is *very* nice indeed.

Yes, I guess it is. But the project doesn't seem very active...

> It is a compact and very fast implementation. I don't see why you'd
> want to wrap this functionality into NumPy, which has a very
> well-defined scope and an efficient implentation.  It would be a
> shame to bloat it with something which is discretely different.

Yes, I guess you're right. There is no point in adding this sort of
thing to numarray. My motivation for using numarray in my
implementations was simply that it would mean that the necessery tools
would be (or might be in the future ;) available in the standard
distribution.

> I have modified kjbuckets so that it compiles and works with Python 2.x. 
> You can pick it up at 
> 
> ftp://xray.imsb.au.dk 
> /pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm
> 
> Just do "rpm --rebuild" on it.
> 
> I sent the patch to the original author, but it appears he is no longer 
> maintaining it. Never mind, it works great.

Well... I do sort of mind... I'm a bit wary of using unmaintained
software. Not that I would never do it or anything... But I think it
would be a bonus to use stuff that is being actively maintained and
developed. But I guess I'll take another look at it.

(Any idea where the "kj" prefix comes from, by the way?)

> /Morten

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From magnus at hetland.org  Thu Apr 25 07:43:10 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 25 07:43:10 2002
Subject: [Numpy-discussion] Non-numeric arrays?
Message-ID: <20020425164228.C6821@idi.ntnu.no>

I can't find this in the docs (although I've heard it's mentioned
there)... Is support for non-numeric arrays (such as character arrays
or object pointer arrays) as in Numeric planned for numarray? (Perhaps
even supported? My version might not be themost recent...)

And what about subclasses of numeric types?

E.g:

# numarray
>>> class foo(int): pass
>>> a = array(map(foo, xrange(10)))
[...]
TypeError: Expecting a python numeric type, got a foo

# Numeric
>>> class foo(int): pass
>>> a = array(map(foo, xrange(10)))
>>> tupe(a[0])
<type 'int'>

Neither behaviour seems very helpful -- I guess numarray's is
cleaner... (Although in this case I think an object array could have
been nice...)

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From jmiller at stsci.edu  Thu Apr 25 07:53:04 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Thu Apr 25 07:53:04 2002
Subject: [Numpy-discussion] Non-numeric arrays?
References: <20020425164228.C6821@idi.ntnu.no>
Message-ID: <3CC81814.2010702@stsci.edu>

Magnus Lie Hetland wrote:

>I can't find this in the docs (although I've heard it's mentioned
>there)... Is support for non-numeric arrays (such as character arrays
>or object pointer arrays) as in Numeric planned for numarray? (Perhaps
>
Check out chararray for character arrays.  
Check out recarray for arrays of fixed length structs.  
To make your own non-numeric arrays, subclass NDArray.

>
>even supported? My version might not be themost recent...)
>
>And what about subclasses of numeric types?
>
>E.g:
>
># numarray
>
>>>>class foo(int): pass
>>>>a = array(map(foo, xrange(10)))
>>>>
>[...]
>TypeError: Expecting a python numeric type, got a foo
>
># Numeric
>
>>>>class foo(int): pass
>>>>a = array(map(foo, xrange(10)))
>>>>tupe(a[0])
>>>>
><type 'int'>
>
>Neither behaviour seems very helpful -- I guess numarray's is
>cleaner... (Although in this case I think an object array could have
>been nice...)
>
Object arrays fall into the *eventually* category:  planned but not 
imminent.

>
>
>--
>Magnus Lie Hetland                                  The Anygui Project
>http://hetland.org                                  http://anygui.org
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
Todd

-- 
Todd Miller 			jmiller at stsci.edu
STSCI / SSG			(410) 338 4576


From magnus at hetland.org  Thu Apr 25 07:54:02 2002
From: magnus at hetland.org (Magnus Lie Hetland)
Date: Thu Apr 25 07:54:02 2002
Subject: [Numpy-discussion] Non-numeric arrays?
In-Reply-To: <20020425164228.C6821@idi.ntnu.no>; from magnus@hetland.org on Thu, Apr 25, 2002 at 04:42:28PM +0200
References: <20020425164228.C6821@idi.ntnu.no>
Message-ID: <20020425165304.D6821@idi.ntnu.no>

Magnus Lie Hetland <magnus at hetland.org>:
[snip]

Just a quick explanation for why I'm interested in this...

I've got a two-dimensional array of ints (or bytes, actually), that I
would like to convert to a delimited string (e.g. comma-separated).

This works in Numeric:

>>> from string import letters
>>> alphabet = array(letters)
>>> data = arange(24) # E.g...
>>> data.shape = 6, 4
>>> fields = sum(take(alphabet, data), 1)
>>> ','.join(fields)
'abcd,efgh,ijkl,mnop,qrst,uvwx'

--
Magnus Lie Hetland                                  The Anygui Project
http://hetland.org                                  http://anygui.org


From perry at stsci.edu  Thu Apr 25 07:57:05 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Thu Apr 25 07:57:05 2002
Subject: [Numpy-discussion] Non-numeric arrays?
In-Reply-To: <20020425164228.C6821@idi.ntnu.no>
Message-ID: <JFEGLNDJEDNOMPPHDEJFGEMMDNAA.perry@stsci.edu>

[I see Todd has already answered this, the following might add
a little more detail]

> -----Original Message-----
> From: numpy-discussion-admin at lists.sourceforge.net
> [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Magnus
> Lie Hetland
> Sent: Thursday, April 25, 2002 10:42 AM
> To: Numpy-discussion
> Subject: [Numpy-discussion] Non-numeric arrays?
> 
> 
> I can't find this in the docs (although I've heard it's mentioned
> there)... Is support for non-numeric arrays (such as character arrays
> or object pointer arrays) as in Numeric planned for numarray? (Perhaps
> even supported? My version might not be themost recent...)
> 
Yes, in fact there is a character array class included with
numarray (but not documented, I believe. For the moment,
you'll have to deal with the source. We developed it for use
with our I/O library but it seemed to be of general enough
use to include with numarray.

We also plan to support arrays of Python objects. There are
various ways that this could be done and we ought to discuss
how it should be done (perhaps multiple ways). But the 
underlying machinery certainly will support it.

> And what about subclasses of numeric types?
> 
> E.g:
> 
> # numarray
> >>> class foo(int): pass
> >>> a = array(map(foo, xrange(10)))
> [...]
> TypeError: Expecting a python numeric type, got a foo
> 
> # Numeric
> >>> class foo(int): pass
> >>> a = array(map(foo, xrange(10)))
> >>> tupe(a[0])
> <type 'int'>
> 
> Neither behaviour seems very helpful -- I guess numarray's is
> cleaner... (Although in this case I think an object array could have
> been nice...)
> 
We haven't had much time to think about how we deal with
numeric subclasses. Certainly one would not use these
for efficiency, I can't see any simple way of making such
things go fast. But it may be possible to have such things 
work with numarray ufuncs and other numeric operations
in some automatic way. I'd have to think about that. It's
not high on the priority list at the moment. (Speaking of
which I may post in a few days).

Thanks, Perry
> 


From hinsen at cnrs-orleans.fr  Thu Apr 25 08:35:06 2002
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Thu Apr 25 08:35:06 2002
Subject: [Numpy-discussion] Graphs in numarray?
In-Reply-To: <20020425162734.B6821@idi.ntnu.no>
References: <20020417163133.F7565@idi.ntnu.no>
	<Pine.LNX.4.33.0204231312100.12942-100000@origo.imsb.au.dk>
	<20020425162734.B6821@idi.ntnu.no>
Message-ID: <m38z7beqwm.fsf@chinon.cnrs-orleans.fr>

Magnus Lie Hetland <magnus at hetland.org> writes:

> (Any idea where the "kj" prefix comes from, by the way?)

I asked Aaron Watter about this. The answer: k and j are the initials
of his children.

Konrad.


From jasper at peak.org  Mon Apr 29 03:14:04 2002
From: jasper at peak.org (Jasper Phillips)
Date: Mon Apr 29 03:14:04 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
Message-ID: <200204291013.DAA32745@spock.peak.org>

I'm helping my wife with programming for her economics thesis, which needs
to calculate a "Multiple Linear Regression" on her data.

Does anyone know of any (preferably though not necesarrily free) software
that can do this? I'm working in Python, but not limited to it as I
can relatively freely access other languages.

I'm still looking for a library written in Python, but haven't had any luck.

My second thought was Matlab, but looking over the Matlab website, I couldn't
find anything like this by a name I recognize. It looks like I might be able
to construct something out of a combination of Sparse Matrices and Linear
Regesstion, or perhaps the stuff for overdetermined Linear Equations?

Another option may be LAPACK routines, but I'm not familiar with those.

Does anyone here have any experience with this kind of stuff? Is there a
better place to ask? I'm about ready to take a shot at writing something
myself, but I'd really rather avoid this if it's been done before.

-Jasper


From hinsen at cnrs-orleans.fr  Mon Apr 29 05:40:03 2002
From: hinsen at cnrs-orleans.fr (Konrad Hinsen)
Date: Mon Apr 29 05:40:03 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
In-Reply-To: <200204291013.DAA32745@spock.peak.org>
References: <200204291013.DAA32745@spock.peak.org>
Message-ID: <m3elgy8yxg.fsf@chinon.cnrs-orleans.fr>

Jasper Phillips <jasper at peak.org> writes:

> I'm still looking for a library written in Python, but haven't had any luck.

Numerical Python has all the basic stuff, but you need to read in and
arrange the data yourself. All linear regression problems ultimately
become least-squares problems for a system of linear equations, which
can be solved using LinearAlgebra.linear_least_squares.

Konrad.


From Alexandre.Fayolle at logilab.fr  Mon Apr 29 06:20:03 2002
From: Alexandre.Fayolle at logilab.fr (Alexandre)
Date: Mon Apr 29 06:20:03 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
In-Reply-To: <200204291013.DAA32745@spock.peak.org>
References: <200204291013.DAA32745@spock.peak.org>
Message-ID: <20020429131937.GE30347@orion.logilab.fr>

On Mon, Apr 29, 2002 at 03:13:44AM -0700, Jasper Phillips wrote:
> I'm helping my wife with programming for her economics thesis, which needs
> to calculate a "Multiple Linear Regression" on her data.
> 
> Does anyone know of any (preferably though not necesarrily free) software
> that can do this? I'm working in Python, but not limited to it as I
> can relatively freely access other languages.
> 
> I'm still looking for a library written in Python, but haven't had any luck.
> 

I'm helping my wife with her History PhD, and have to deal with similar
stuff. I found R to be a very useful environment for statistical
computations. R is a free software clone of S-plus, which is to statistics
what Matlab is to linear algebra and automation. 

Pros: 
 - programming environment, with a high level programming language
 - extensive statistical and linalg library (using C and FORTRAN code)
 - lots of third party code available, covering a very wide range of
   situations
 - Python bindings available if you don't want to learn the Scheme-like
   language
 - Tons of documentation available
 - Excellent support through the mailing lists
 - GPL'd
 - Tons of way to import data (ranging from CSV files to ODBC queries)
 - 2 printed books available, at Springer Verlag
 - postscript, png, wmf, X outputs, with precise control of the layout
   of the graphs and figures available for a nice colourful thesis

Cons:
 - the language can be a bit weird at times (it took me some time to get
   used to '.' being used instead of '_' and vice versa in the scoping
   and variable naming), but you can use Python to script R, thanks to
   RPython
 - it's quite a big piece of code, with a rather steep learning curve
   and you need time to get inside it
 - the documentation is aimed at professional statisticians. I had to
   dig back in my statistics courses and to buy a couple of books on
   that topic for the software to become really useful. Asking newbie
   statistician questions on the r-help mailing list is off-topic
 - the springer verlag books are very expensive (Modern Applied
   Statistics with S-plus costs something like 70 euros), but they are
   great

So you have a powerful tool available at your fingertips, designed to do
precisely what you need. I think it's worth taking the time to look at
it carefully. The more I get to understand the topic, the more ideas I
get for new ways of exploring the data of my wife's PhD. 

 
Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).


From Alexandre.Fayolle at logilab.fr  Mon Apr 29 06:28:08 2002
From: Alexandre.Fayolle at logilab.fr (Alexandre)
Date: Mon Apr 29 06:28:08 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
In-Reply-To: <20020429131937.GE30347@orion.logilab.fr>
References: <200204291013.DAA32745@spock.peak.org> <20020429131937.GE30347@orion.logilab.fr>
Message-ID: <20020429132741.GF30347@orion.logilab.fr>

On Mon, Apr 29, 2002 at 03:19:37PM +0200, Alexandre wrote:

> I'm helping my wife with her History PhD, and have to deal with similar
> stuff. I found R to be a very useful environment for statistical
> computations. R is a free software clone of S-plus, which is to statistics
> what Matlab is to linear algebra and automation. 


Woops, I forgot to add a couple of URLs:

The R project website 
 http://www.r-project.org/
The Comprehensive R Archive Network (CRAN)
 http://cran.r-project.org/
Using R from Python
 http://rpy.sourceforge.net/
Using R from Python and Python from R (coding R extensions in Python)
 http://www.omegahat.org/RSPython/

Cheers,

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).


From cavallo at kip.uni-heidelberg.de  Mon Apr 29 10:10:20 2002
From: cavallo at kip.uni-heidelberg.de (cavallo at kip.uni-heidelberg.de)
Date: Mon Apr 29 10:10:20 2002
Subject: [Numpy-discussion] kdfio, 1.1.1
Message-ID: <Pine.LNX.4.33.0204291901350.12869-100000@modigliani.darktech.org>

hy,
here is the url last version of kdfio a khoros/cantata kdf file importer:
nothing special, but it seems  working now, at least for me;-)
You can find it at:

http://kdfio.sourceforge.net

This is my (very) small contribution to the numerical python:
inside i plugged a way to modularize the code (and writing some skeleton
semi-automatically) that could speed up a litte bit writing new code.
Before to give a full announcement on sourceforge i will wait a little
bit, just to see if there are no bugs around.
Fell free to use/change/make what you want,

thanks to all,
antonio cavallo

ps. khoros is available at http://www.khoral.com and it is not a free
program: there is just a free student version.


From jmiller at stsci.edu  Mon Apr 29 10:14:07 2002
From: jmiller at stsci.edu (Todd Miller)
Date: Mon Apr 29 10:14:07 2002
Subject: [Numpy-discussion] ANN: Numarray-0.3.3
Message-ID: <3CCD7F49.5030809@stsci.edu>

Numarray 0.3.3
---------------------------------
Numarray is an array processing package designed to efficiently 
manipulate large multi-dimensional arrays.  Numarray is modelled after 
Numeric and features c-code generated from python template scripts, the 
capacity to operate directly on arrays in files, and improved type 
promotions.

Numarray-0.3.3 features improved support for arrays of complex numbers, 
re-implementing complex types using generated code.  In  addition to 
being faster, the new complex ufuncs are better integrated with the 
numarray type system, so operations between numarrays and complex 
scalars now work properly.  This release also fixes a problem 
experienced by RedHat Linux users installing numarray from source.

WHERE
-----------
Numarray-0.3.3 windows executable installers and source code tar ball is 
here:

http://sourceforge.net/project/showfiles.php?group_id=1369

Numarray is hosted by Source Forge in the same project which hosts Numeric:

http://sourceforge.net/projects/numpy/

The web page for Numarray information is at:

http://stsdas.stsci.edu/numarray/index.html

Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at
the Source Forge project for NumPy at:

http://sourceforge.net/tracker/?group_id=1369

REQUIREMENTS
--------------------------

numarray-0.3.3 requires Python 2.0 or greater.


AUTHORS, LICENSE
------------------------------

Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC
Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science
Institute.  Thanks go to Jochen Kupper of the University of North
Carolina for his work on Numarray and for porting the Numarray manual
to TeX format.

Numarray is made available under a BSD-style License.  See
LICENSE.txt in the source distribution for details.

-- 
Todd Miller             jmiller at stsci.edu


From haase at msg.ucsf.edu  Mon Apr 29 11:18:15 2002
From: haase at msg.ucsf.edu (Sebastian Haase)
Date: Mon Apr 29 11:18:15 2002
Subject: [Numpy-discussion] unsigned short  support in NumPy
Message-ID: <auto-000000164030@msg.ucsf.edu>

Hi all,
I'm _very_ new to NumPy.
I was interested in using it for our project, where we acquire data from a 
CCD camera.

The Problem:  Each pixel in the image is a 16 bit gray value.
             What I read in the documentation - there is only 8 bit (unsigned 
integer) support in numpy (or should I say numericarray)

Are there plans to add a "unsigned short" (16 bit) support .
How much effort would that be.

Regards,
Sebastian Haase


-- 
                                _\\|//_
                               (' O-O ')
------------------------------ooO-(_)-Ooo--------------------------------------
Sebastian Haase
University of California, San Francisco
(415)502-4316


From rick at bioinformatics.org  Mon Apr 29 11:35:26 2002
From: rick at bioinformatics.org (Rick Ree)
Date: Mon Apr 29 11:35:26 2002
Subject: [Numpy-discussion] testing Numeric.array([0])
Message-ID: <1020105268.12239.27.camel@loco.ucdavis.edu>

Should Numeric.array([0]) test false?  This seems counterintuitive, and
is not the case for the regular python array module.

This recently caused a subtle bug for me when I wanted to find the
indices of an array that met a condition.  If only the first element met
the condition, the result was array([0]) -- a non-empty result that
evaluated false.

If this is the intended behavior, can someone tell me the reason?

thanks,
Rick


From perry at stsci.edu  Mon Apr 29 13:31:14 2002
From: perry at stsci.edu (Perry Greenfield)
Date: Mon Apr 29 13:31:14 2002
Subject: [Numpy-discussion] unsigned short  support in NumPy
In-Reply-To: <auto-000000164030@msg.ucsf.edu>
Message-ID: <JFEGLNDJEDNOMPPHDEJFKENODNAA.perry@stsci.edu>


> Sebastian Haase writes:

> 
> Hi all,
> I'm _very_ new to NumPy.
> I was interested in using it for our project, where we acquire 
> data from a 
> CCD camera.
> 
> The Problem:  Each pixel in the image is a 16 bit gray value.
>              What I read in the documentation - there is only 8 
> bit (unsigned 
> integer) support in numpy (or should I say numericarray)
> 
> Are there plans to add a "unsigned short" (16 bit) support .
> How much effort would that be.
> 
There is a reimplemenation of Numeric that we are doing that does
support unsigned ints (Unsigned Int8, Unsigned Int16 for now).
The project is not mature, but a lot of basic cabability exists
now. You'll have to look it over to judge if it is usable 
for you now. The new version is called numarray
( http://stsdas.stsci.edu/numarray )

(btw, we acquire data from CCD cameras as well ;-) 

Perry


From tchur at optushome.com.au  Mon Apr 29 13:46:39 2002
From: tchur at optushome.com.au (Tim Churches)
Date: Mon Apr 29 13:46:39 2002
Subject: [Numpy-discussion] Multiple Linear Regression?
References: <200204291013.DAA32745@spock.peak.org>
Message-ID: <3CCDBB6C.8A983A5C@optushome.com.au>

Jasper Phillips wrote:
> 
> I'm helping my wife with programming for her economics thesis, which needs
> to calculate a "Multiple Linear Regression" on her data.
> 
> Does anyone know of any (preferably though not necesarrily free) software
> that can do this? I'm working in Python, but not limited to it as I
> can relatively freely access other languages.

Jasper,

Use R (a free implementation of S). See http://www.r-project.org

If you are managing your data in Python and NumPy, you can "embed" R in
Python
and transparently send data to it using Walter Moreira's wonderful RPy
module - 
see http://rpy.sf.net

Tim C