[Numpy-discussion] fromstring, tostring slow?

Mark Janikas mjanikas at esri.com
Tue Feb 13 19:53:45 EST 2007


Found a typo-or-two in my description.  #2 and #3 are nnx1 in shape

-----Original Message-----
From: numpy-discussion-bounces at scipy.org
[mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas
Sent: Tuesday, February 13, 2007 4:31 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] fromstring, tostring slow?

This is all very good info.  Especially, the byteswap.  Ill be testing
it momentarily.  As far as a detailed explanation of the problem....

In essence, I am applying sparse matrix multiplication.  The matrix of
which I am dealing with in the matter described is nxn.  Generally, this
matrix is 1-20% sparse.  I use it in spatial data analysis, where the
matrix W represents the spatial association between n observations.  The
operations I perform on it are generally related to the spatial lag of a
variable... or Wy, where y is a nxk matrix (usually k=1).  As k is
generally small, the y vector and the result vector are represented by
numpy arrays.  I can have nxkx2 pieces of info in mem (usually).  What I
cant have is n**2.  So, I store each row of W in a file as a record
consisting of 3 parts:

1) row, nn (# of neighbors)
2) nhs (nx1) vector of integers representing the columns in row[i] != 0
3) weights (nx1) vector of floats corresponding to the index in the
previous row

The first two parts of the record are known as a GAL or geographic
algorithm library.  Since a lot of my W matrices have distance metrics
associated with them I added the third.  I think this might be termed by
someone else as an enhanced GAL.  At any rate, this allows me to perform
this operation on large datasets w/o running out of mem.


-----Original Message-----
From: numpy-discussion-bounces at scipy.org
[mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher
Barker
Sent: Tuesday, February 13, 2007 4:07 PM
To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] fromstring, tostring slow?

Mark Janikas wrote:
> I don't think I can do that because I have heterogeneous rows of
> data.... I.e. the columns in each row are different in length.

like I said, show us your whole problem...

But you don't have to write.read all the data at once with from/tofile()

anyway. Each of your "rows" has to be in a separate array anyway, as 
numpy arrays don't support "ragged" arrays, but each row can be written 
with tofile()

> Furthermore, when reading it back in, I want to read only bytes of the
> info at a time so I can save memory.  In this case, I only want to
have
> one record in mem at once.

you can make multiple calls to fromfile(), thou you'll have to know how 
long each record is.

> Another issue has arisen from taking this routine cross-platform....
> namely, if I write the file on Windows I cant read it on Solaris.  I
> assume the big-little endian is at hand here.

yup.

> I know using the struct
> module that I can pack using either one.

so can numpy. see the "byteswap" method, and you can specify a 
particular endianess with a datatype when you read with fromfile():

a = N.fromfile(DataFile, dtype=N.dtype("<d"), count=20)

reads 20 little-endian doubles from DataFile, regardless of the native 
endianess of the machine you're on.

-Chris

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion at scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion at scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list