iterating over multi-line string

Sun Sep 11 13:17:25 EDT 2016

Doug OLeary wrote:

> Hey;
> 
> I have a multi-line string that's the result of reading a file filled with
> 'dirty' text.  I read the file in one swoop to make data cleanup a bit
> easier - getting rid of extraneous tabs, spaces, newlines, etc.  That
> part's done.
> 
> Now, I want to collect data in each section of the data.  Sections are
> started with a specific header and end when the next header is found.
> 
> ^1\. Upgrade to the latest version of Apache HTTPD
> ^2\. Disable insecure TLS/SSL protocol support
> ^3\. Disable SSLv2, SSLv3, and TLS 1.0. The best solution is to only have
> TLS 1.2 enabled ^4\. Disable HTTP TRACE Method for Apache
> [[snip]]
> 
> There's something like 60 lines of worthless text before that first header
> line so I thought I'd skip through them with:
> 
> x=0  # Current index
> hx=1 # human readable index
> rgs = '^' + str(hx) + r'\. ' + monster['vulns'][x]
> hdr = re.compile(rgs)
> for l in data.splitlines():
>   while not hdr.match(l):
>     next(l)
>   print(l)
> 
> which resulted in a typeerror stating that str is not an iterator.  More
> googling resulted in:
> 
> iterobj = iter(data.splitlines())
> 
> for l in iterobj:
>   while not hdr.match(l):
>     next(iterobj)
>   print(l)
> 
> I'm hoping to see that first header; however, I'm getting another error:
> 
> Traceback (most recent call last):
>   File "./testies.py", line 30, in <module>
>     next(iterobj)
> StopIteration
> 
> I'm not quite sure what that means... Does that mean I got to the end of
> data w/o finding my header?
> 
> Thanks for any hints/tips/suggestions.

If you nest the loops you don't just skip the lines before the first, but 
before every header. Unless your data ends with a header the inner loop will 
eventually run out of lines without seeing another header.

Here are two clean (I think) ways to capture the lines starting with the 
first header. 

(1) Do-it-yourself, with a generator:

def rest(lines, isheader):
    lines = iter(lines)
    for line in lines:
        if isheader(line):
            yield line # the first header
            break
    yield from lines # all lines following the first headder

for line in rest(data.splitlines(), hdr.match):
    print(line)

(2) Using the itertools from Python's standard library:

import itertools

def is_no_header(line):
    return hdr.match(line) is None

for line in itertools.dropwhile(is_no_header, data.splitlines()):
    print(line)

Both versions work with file objects, you just have to tell print not to add 
a newline with

    print(line, end="")