Canonical way of dealing with null-separated lines?

Douglas Alan nessus at mit.edu
Sat Feb 26 18:07:39 EST 2005


I wrote:

> Okay, here's the definitive version (or so say I).  Some good doobie
> please make sure it makes its way into the standard library:

Oops, I just realized that my previously definitive version did not
handle multi-character newlines.  So here is a new definition
version.  Oog, now my brain hurts:

def fileLineIter(inputFile, newline='\n', leaveNewline=False, readSize=8192):
   """Like the normal file iter but you can set what string indicates newline.

   The newline string can be arbitrarily long; it need not be restricted to a
   single character. You can also set the read size and control whether or not
   the newline string is left on the end of the iterated lines.  Setting
   newline to '\0' is particularly good for use with an input file created with
   something like "os.popen('find -print0')".
   """
   isNewlineMultiChar = len(newline) > 1
   outputLineEnd = ("", newline)[leaveNewline]

   # 'partialLine' is a list of strings to be concatinated later:
   partialLine = []

   # Because read() might unfortunately split across our newline string, we
   # have to regularly check to see if the newline string appears in what we
   # previously thought was only a partial line.  We do so with this generator:
   def linesInPartialLine():
      if isNewlineMultiChar:
         linesInPartialLine = "".join(partialLine).split(newline)
         if linesInPartialLine > 1:
            partialLine[:] = [linesInPartialLine.pop()]
            for line in linesInPartialLine:
               yield line + outputLineEnd

   while True:
      charsJustRead = inputFile.read(readSize)
      if not charsJustRead: break
      lines = charsJustRead.split(newline)
      if len(lines) > 1:
         for line in linesInPartialLine(): yield line
         partialLine.append(lines[0])
         lines[0] = "".join(partialLine)
         partialLine[:] = [lines.pop()]
      else:
         partialLine.append(lines.pop())
         for line in linesInPartialLine(): yield line
      for line in lines: yield line + outputLineEnd
   for line in linesInPartialLine(): yield line
   if partialLine and partialLine[-1] != '':
      yield "".join(partialLine)


|>oug



More information about the Python-list mailing list