parsing text in blocks and line too

James Stroud jstroud at mbi.ucla.edu
Thu Apr 12 07:53:44 EDT 2007


A.T.Hofkamp wrote:
> On 2007-04-12, flyzone at technologist.com <flyzone at technologist.com> wrote:
>> Goodmorning people :)
>> I have just started to learn this language and i have a logical
>> problem.
>> I need to write a program to parse various file of text.
>> Here two sample:
>>
>> ---------------
>> trial text bla bla bla bla error
>>       bla bla bla bla bla
>>       bla bla bla on more lines
>> trial text bla bla bla bla warning bla
>>       bla bla more bla to be grouped with warning
>>       bla bla bla on more lines
>>       could be one two or ten lines also withouth the tab beginning
>> again text
>> text can contain also blank lines
>> text no delimiters....
>> --------------
>> Apr  8 04:02:08 machine text on one line
>> Apr  8 04:02:09 machine this is an error
>> Apr  8 04:02:10 machine this is a warning
>> --------------
> 
> I would first read groups of lines that belong together, then decide on each
> group whether it is an error, warning, or whatever.
> To preserve order in a group of lines, you can use lists.
> 
> From your example you could first compute a list of lists, like
> 
> [ [ "trial text bla bla bla bla error",
>     "      bla bla bla bla bla",
>     "      bla bla bla on more lines" ],
>   [ "trial text bla bla bla bla warning bla",
>     "      bla bla more bla to be grouped with warning",
>     "      bla bla bla on more lines",
>     "      could be one two or ten lines also withouth the tab beginning" ],
>   [ "again text" ],
>   [ "text can contain also blank lines" ],
>   [ ],
>   [ "text no delimiters...." ]
> ]
> 
> Just above the "text no delimiters...." line I have added an empty line, and I
> translated that to an empty group of lines (denoted with the empty list).
> 
> By traversing the groups (ie over the outermost list), you can now decide for
> each group what type of output it is, and act accordingly.
> 
>> Hope someone could give me some tips.
> 
> Sure, however, in general it is appreciated if you first show your own efforts
> before asking the list for a solution.
> 
> Albert

If groups have 0 indent first line and other lines in the group are 
indented, group the lines

blocks = []
block = []
for line in lines:
   if not line.startswith(' '):
     if block:
       blocks.append(block)
     block = []
   block.append(line)
if block:
   blocks.append(block)

But if 0 indent doesn't start a new block, don't expect this to work, 
but that is what I infer from your limited sample.

You can then look for warnings, etc., in the blocks--either in the loop 
to save memory or in the constructed blocks list.

James





More information about the Python-list mailing list