[SciPy-Dev] Scipy.io.savemat optimization issue.

Jamie Tsao superscript92 at yahoo.com
Wed Oct 7 01:31:47 EDT 2015


I was attempting to add an optimization to help save some memory from using savemat along with my other commits attempting to fix a bug concerning compression with zlib. The optimization uses an np.array's data attribute, which in python 2.7 is a buffer. This is nice, because in the case the array is Fortran contiguous (much so with 1D arrays unless created from .real or .imag from complex arrays), then I can just pass the data buffer to file.write() without using much memory. I.e. a real sparse matrix will at most use 133% of the matrices memory to save it to disk. And even if it isn't Fortran compliant, I would just do as originally done: use tostring() to achieve the bytes in Fortran order.


But then comes python 3. Using python 3.4, I found that now np.array.data is not a buffer but a memoryview. Unfortunately, the memoryview doesn't have the same ability to grab the underlying bytes in the same manner, so file.write() won't write it correctly. Furthermore, file.write() only accepts strings, not bytes. Hence, I would have to do something like str(bytes(arr.data)) to pass it and save to disk, which isn't as good as calling tostring(). What should I do to get around this?


My failed pull request is here, where the last commit concerns this problem: http://www.github.com/scipy/scipy/pull/5325




On a side note, is my approach to byte counting fine? I was hoping that it will no longer need to seek around, which helps out a lot with compression, but my current code does this even without compression. Originally, I was afraid the running time would be twice as long (although I claim not many will repeatedly use savemat to the point of it being a bottleneck), but it turns out its runtime is nearly the same. Weird?


Lastly, I didn't add any test cases. The only bug I fixed was concerning compressing down a >2GB (in my case, sparse) matrix, but I don't want test cases to create massive matrices and use up tons of memory.


-Jamie Tsao

Sent from Yahoo Mail on Android

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20151006/9e0748ec/attachment.html>


More information about the SciPy-Dev mailing list