Canonical way of dealing with null-separated lines?

Thu Feb 24 16:51:22 EST 2005

On Fri, Feb 25, 2005 at 07:56:49AM +1100, John Machin wrote:
> Try this:
> !def readweird(f, line_end='\0', bufsiz=8192): 
> !    retain = '' 
> !    while True: 
> !        instr = f.read(bufsiz)
> !        if not instr:
> !            # End of file 
> !            break 
> !        splitstr = instr.split(line_end)
> !        if splitstr[-1]:
> !            # last piece not terminated
> !            if retain:
> !                splitstr[0] = retain + splitstr[0]
> !            retain = splitstr.pop()
> !        else:
> !            if retain:
> !                splitstr[0] = retain + splitstr[0]
> !                retain = ''
> !            del splitstr[-1]
> !        for element in splitstr: 
> !            yield element 
> !    if retain:
> !        yield retain
> 

I think this is a definite improvement... especially putting the buffer size
and line terminators as optional arguments, and handling empty files. I think,
however that the if splitstr[-1]: ... else: ... clauses aren't necessary, so I
would probably reduce it to this:

!def readweird(f, line_end='\0', bufsiz=8192):
!    retain = ''
!    while True:
!        instr = f.read(bufsiz)
!        if not instr:
!            # End of file
!            break
!        splitstr = instr.split(line_end)
!        if retain:
!            splitstr[0] = retain + splitstr[0]
!        retain = splitstr.pop()
!        for element in splitstr:
!            yield element
!    if retain:
!        yield retain 

Popping off that last member and then iterating over the rest of the list as
you suggested is so much more efficient, and it looks a lot better. 

Chris