[SciPy-dev] huge speed regression in loadmat from 0.6.0 to 0.7.0

Wed Feb 11 15:03:13 EST 2009

Ryan May wrote:
> ... Well, here's a patch against gzipstreams.py that changes to add the 
> chunks to a list and only add to the string at the very end. See if it 
> helps your case.  If not, is there somewhere you can put the datafile so 
> that we can test with it?
Well, in your patch, instead of:
@@ -95,11 +100,12 @@
              data = self.fileobj.read(n_to_fetch)
              self._bytes_read += len(data)
              if data:
-                self.data += self._unzipper.decompress(data)
+                self_data += self._unzipper.decompress(data)
              if len(data) < n_to_fetch: # hit end of file
-                self.data += self._unzipper.flush()
+                self_data += self._unzipper.flush()
                  self.exhausted = True
                  break
+        self.data += ''.join(self_data)

Use:
@@ -95,11 +100,12 @@
              data = self.fileobj.read(n_to_fetch)
              self._bytes_read += len(data)
              if data:
-                self.data += self._unzipper.decompress(data)
+                self_data.append(self._unzipper.decompress(data))
              if len(data) < n_to_fetch: # hit end of file
-                self.data += self._unzipper.flush()
+                self_data.append(self._unzipper.flush())
                  self.exhausted = True
                  break
+        self.data += ''.join(self_data)