[Tutor] How to select particular lines from a text

Kent Johnson kent37 at tds.net
Sat Dec 4 16:09:14 CET 2004


kumar,

Here is a solution for you. The makeSections() function will iterate through blocks in the file and 
return each one in turn to the caller.

makeSections() is a generator function - the use of yield makes it one. That means that it returns 
an iterator that can be used in a for loop. Each time yield is executed it returns a new value to 
the loop. In this case, the values returned are the contents of each section.

The loop in makeSections just walks through the lines of the input file. It accumulates the lines 
into a list and looks for special markers. The markers are, a 'Name:' line, to start a new section, 
and a blank line, to end a section. When it finds a marker it outputs the current section, if there 
is one, and starts a new one.

Kent

PS this question is much better asked than the last - you clearly stated what you want in a simple form.


data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
....................
xxxxxxxxxxxxxxxxxxxx


Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO    # just for test

def makeSections(f):
     ''' This is a generator function. It will return successive sections
         of f until EOF.

         Sections are every line from a 'Name:' line to the first blank line.
         Sections are returned as a list of lines with line endings stripped.
     '''

     currSection = []

     for line in f:
         line = line.strip()
         if line == 'Name:':
             # Start of a new section
             if currSection:
                 yield currSection
                 currSection = []
             currSection.append(line)

         elif not line:
             # Blank line ends a section
             if currSection:
                 yield currSection
                 currSection = []

         else:
             # Accumulate into a section
             currSection.append(line)

     # Yield the last section
     if currSection:
         yield currSection


f = cStringIO.StringIO(data)

for section in makeSections(f):
     print 'Section'
     for line in section:
         print '   ', line
     print


kumar s wrote:
> Dear group, 
>  This is continuation to my previous email with
> sugject line "Python regular expression".  My text
> file although, looks like .ini file, but it is not. It
> is a chip definition file from Gene chip.  it is a
> huge file with over 340,000 lines.
> 
> I have particular set of question in general not
> related to that file:
> 
> Exmple text:
> 
> Name:
> City:
> xxxxxxxxxxxxxxxxxxxx
> xxxxxxxxxxxxxxxxxxxx
> ....................
> xxxxxxxxxxxxxxxxxxxx
> 
> 
> Name:
> City:
> xxxxxxxxxxxxxxxxxxxx
> xxxxxxxxxxxxxxxxxxxx
> 
> Characterstics of this text:
> 1. This text is divided into blocks and every block
> start with 'Name'.  The number of lines after this
> identifier is random. 
> 
> In this particular case how a particular logic I can
> think of to extract some of these blocks is:
> 1.write a reg.exp to identify the Name identifier one
> need.
> 2. based on the this, ask the program to select all
> lines after that until it hits either a new line OR
> another name identifier:
> 
> My question:
> 
> How can I tell my program these 2 conditions:
> 
> 1. mark the identifier i need and select all the lines
> after that identifier until it hits a new line or
> another name identifier. 
> 
> 
> please englihten me with your suggestions. 
> 
> thank you. 
> 
> kumar
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Read only the mail you want - Yahoo! Mail SpamGuard. 
> http://promotions.yahoo.com/new_mail 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


More information about the Tutor mailing list