Reading bz2 file into numpy array

Nobody nobody at nowhere.com
Mon Nov 22 22:39:02 EST 2010


On Mon, 22 Nov 2010 11:37:22 +0100, Peter Otten wrote:

>> is there a convenient way to read bz2 files into a numpy array?
> 
> Try

> f = bz2.BZ2File(filename)
> data = numpy.fromstring(f.read(), numpy.float32)

That's going to hurt if the file is large.

You might be better off either extracting to a temporary file, or creating
a pipe with numpy.fromfile() reading the pipe and either a thread or
subprocess decompressing the data into the pipe.

E.g.:

import os
import threading

class Pipe(threading.Thread):
    def __init__(self, f, blocksize = 65536):
        super(Pipe, self).__init__()
        self.f = f
        self.blocksize = blocksize
        rd, wr = os.pipe()
        self.rd = rd
        self.wr = wr
        self.daemon = True
        self.start()
    
    def run(self):
        while True:
            s = self.f.read(self.blocksize)
            if not s:
                break
            os.write(self.wr, s)
        os.close(self.wr)
            
def make_real(f):
    return os.fdopen(Pipe(f).rd, 'rb')

Given the number of situations where you need a "real" (OS-level) file
handle or descriptor rather than a Python "file-like object",
something like this should really be part of the standard library.




More information about the Python-list mailing list