parsing text in blocks and line too
James Stroud
jstroud at mbi.ucla.edu
Thu Apr 12 07:53:44 EDT 2007
A.T.Hofkamp wrote:
> On 2007-04-12, flyzone at technologist.com <flyzone at technologist.com> wrote:
>> Goodmorning people :)
>> I have just started to learn this language and i have a logical
>> problem.
>> I need to write a program to parse various file of text.
>> Here two sample:
>>
>> ---------------
>> trial text bla bla bla bla error
>> bla bla bla bla bla
>> bla bla bla on more lines
>> trial text bla bla bla bla warning bla
>> bla bla more bla to be grouped with warning
>> bla bla bla on more lines
>> could be one two or ten lines also withouth the tab beginning
>> again text
>> text can contain also blank lines
>> text no delimiters....
>> --------------
>> Apr 8 04:02:08 machine text on one line
>> Apr 8 04:02:09 machine this is an error
>> Apr 8 04:02:10 machine this is a warning
>> --------------
>
> I would first read groups of lines that belong together, then decide on each
> group whether it is an error, warning, or whatever.
> To preserve order in a group of lines, you can use lists.
>
> From your example you could first compute a list of lists, like
>
> [ [ "trial text bla bla bla bla error",
> " bla bla bla bla bla",
> " bla bla bla on more lines" ],
> [ "trial text bla bla bla bla warning bla",
> " bla bla more bla to be grouped with warning",
> " bla bla bla on more lines",
> " could be one two or ten lines also withouth the tab beginning" ],
> [ "again text" ],
> [ "text can contain also blank lines" ],
> [ ],
> [ "text no delimiters...." ]
> ]
>
> Just above the "text no delimiters...." line I have added an empty line, and I
> translated that to an empty group of lines (denoted with the empty list).
>
> By traversing the groups (ie over the outermost list), you can now decide for
> each group what type of output it is, and act accordingly.
>
>> Hope someone could give me some tips.
>
> Sure, however, in general it is appreciated if you first show your own efforts
> before asking the list for a solution.
>
> Albert
If groups have 0 indent first line and other lines in the group are
indented, group the lines
blocks = []
block = []
for line in lines:
if not line.startswith(' '):
if block:
blocks.append(block)
block = []
block.append(line)
if block:
blocks.append(block)
But if 0 indent doesn't start a new block, don't expect this to work,
but that is what I infer from your limited sample.
You can then look for warnings, etc., in the blocks--either in the loop
to save memory or in the constructed blocks list.
James
More information about the Python-list
mailing list