Frankenstring

Tue Jul 12 23:54:14 EDT 2005

On Wed, 13 Jul 2005 03:49:16 +0200, Thomas Lotze <thomas at thomas-lotze.de> wrote:

>Scott David Daniels wrote:
>
>> Now if you want to do it for a file, you could do:
>> 
>>      for c in thefile.read():
>>          ....
>
>The whole point of the exercise is that seeking on a file doesn't
>influence iteration over its content. In the loop you suggest, I can
>seek() on thefile to my heart's content and will always get its content
>iterated over exactly from beginning to end. It had been read before any
>of this started, after all. Similarly, thefile.tell() will always tell me
>thefile's size or the place I last seek()'ed to instead of the position of
>the next char I will get.
>
What I suggested in my other post (untested beyond what you see, so you
may want to add to the test ):

----< lotzefile.py >--------------------------------------------------
class LotzeFile(file):
    BUFSIZE = 4096
    def __init__(self, path, mode='r'):
        self.f = file(path, mode)
        self.pos = self.bufbase =  0
        self.buf = ''
    def __iter__(self): return self
    def next(self):
        if not self.buf[self.pos:]:
            self.bufbase += len(self.buf)
            self.pos = 0
            self.buf = self.f.read(self.BUFSIZE)
            if not self.buf:
                self.close()
                raise StopIteration
        byte = self.buf[self.pos]
        self.pos += 1
        return byte
    def seek(self, pos, ref=0):
        self.f.seek(pos, ref)
        self.bufbase = self.f.tell()
        self.pos = 0
        self.buf = ''
    def tell(self):
        return self.bufbase + self.pos
    def close(self):
        self.f.close()

def test():
    f = file('lotzedata.txt','w')
    for s in (' %3d'%i for i in xrange(1000)): f.write(s)
    f.close()

    it = iter(LotzeFile('lotzedata.txt'))

    hold4=[0,0,0,0]
    for i, c in enumerate(it):
        hold4[i%4] = c
        if i%4==3: 
            print hold4
            assert (i-3)/4 == int(''.join(hold4))
        if i == 99: break
    print it.tell()
    it.seek(52)
    for i in xrange(8): print it.next(),
    print
    it.seek(990*4)
    for c in it: print c,

if __name__ == '__main__':
    test()
----------------------------------------------------------------------

Result:

[20:53] C:\pywk\clp>py24 lotze.py
[' ', ' ', ' ', '0']
[' ', ' ', ' ', '1']
[' ', ' ', ' ', '2']
[' ', ' ', ' ', '3']
[' ', ' ', ' ', '4']
[' ', ' ', ' ', '5']
[' ', ' ', ' ', '6']
[' ', ' ', ' ', '7']
[' ', ' ', ' ', '8']
[' ', ' ', ' ', '9']
[' ', ' ', '1', '0']
[' ', ' ', '1', '1']
[' ', ' ', '1', '2']
[' ', ' ', '1', '3']
[' ', ' ', '1', '4']
[' ', ' ', '1', '5']
[' ', ' ', '1', '6']
[' ', ' ', '1', '7']
[' ', ' ', '1', '8']
[' ', ' ', '1', '9']
[' ', ' ', '2', '0']
[' ', ' ', '2', '1']
[' ', ' ', '2', '2']
[' ', ' ', '2', '3']
[' ', ' ', '2', '4']
100
    1 3     1 4
  9 9 0   9 9 1   9 9 2   9 9 3   9 9 4   9 9 5   9 9 6   9 9 7   9 9 8   9 9 9

I suspect you could get better performance if you made LotzeFile instances able to
return interators over buffer chunks and get characters from them, which would
be string iterators supplying the characters rather than the custom .next, but
the buffer chunks would have to be of some size to make that pay. Testing is
the only way to find out what the crossing point is, if you really have to.

Regards,
Bengt Richter