Canonical way of dealing with null-separated lines?

Tue Mar 1 19:10:10 EST 2005

Douglas Alan wrote:
> "John Machin" <sjmachin at lexicon.net> writes:
>
> >>        lines = (partialLine + charsJustRead).split(newline)
>
> > The above line is prepending a short string to what will typically
be a
> > whole buffer full. There's gotta be a better way to do it.
>
> If there is, I'm all ears.  In a previous post I provided code that
> doesn't concatinate any strings together until the last possible
> moment (i.e. when yielding a value).  The problem with that the code
> was that it was complicated and didn't work right in all cases.
>
> One way of solving the string concatination issue would be to write a
> string find routine that will work on lists of strings while ignoring
> the boundaries between list elements.  (I.e., it will consider the
> list of strings to be one long string for its purposes.)  Unless it
is
> written in C, however, I bet it will typically be much slower than
the
> code I just provided.
>
> > Perhaps you might like to refer back to CdV's solution which was
> > prepending the residue to the first element of the split() result.
>
> The problem with that solution is that it doesn't work in all cases
> when the line-separation string is more than one character.
>
> >>        for line in lines: yield line + outputLineEnd
>
> > In the case of leaveNewline being false, you are concatenating an
empty
> > string. IMHO, to quote Jon Bentley, one should "do nothing
gracefully".
>
> In Python,
>
>    longString + "" is longString
>
> evaluates to True.  I don't know how you can do nothing more
> gracefully than that.

And also "" + longString is longString

The string + operator provides those graceful *external* results by
ugly special-case testing internally.

It is not graceful IMHO to concatenate a variable which you already
know refers to a null string.

Let's go back to the first point, and indeed further back to the use
cases:

(1) multi-byte separator for lines in test files: never heard of one
apart from '\r\n'; presume this is rare, so test for length of 1 and
use Chris's simplification of my effort in this case.

(2) keep newline: with the standard file reading routines, if one is
going to do anything much with the line other than write it out again,
one does buffer = buffer.rstrip('\n') anyway. In the case of a
non-standard separator, one is likely to want to write the line out
with the standard '\n'. So, specialisation for this is indicated:

! if keepNewline:
!     for line in lines: yield line + newline
! else:
!     for line in lines: yield line

Cheers,
John