how to split this kind of text into sections

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Apr 25 11:18:33 EDT 2014


On Fri, 25 Apr 2014 21:07:53 +0800, oyster wrote:

> I have a long text, which should be splitted into some sections, where
> all sections have a pattern like following with different KEY. And the
> /n/r can not be used to split
> 
> I don't know whether this can be done easily, for example by using RE
> module

[... snip example ...]

> I hope I have state myself clear.

Clear as mud.

I'm afraid I have no idea what you mean. Can you explain the decision 
that you make to decide whether a line is included, or excluded, or part 
of a section?



> [demo text starts]
> a line we do not need

How do we decide whether the line is ignored? Is it the literal text "a 
line we do not need"?

for line in lines:
    if line == "a line we do not need\n":
        # ignore this line
        continue


> I am section axax
> I am section bbb, we can find that the first 2 lines of this section all
> startswith 'I am section'


Again, is this the *literal* text that you expect?

> .....(and here goes many other text)... let's continue to
>  let's continue, yeah
>  .....(and here goes many other text)...
> I am using python
> I am using perl
>  .....(and here goes many other text)...
> [demo text ends]
> 
> the above text should be splitted as a LIST with 3 items, and I also
> need to know the KEY for LIST is ['I am section', 'let's continue', 'I
> am using']:

How do you decide that they are the keys?


> lst=[
>  '''I am section axax
> I am section bbb, we can find that the first 2 lines of this section all
> startswith 'I am section'
> .....(and here goes many other text)...''',
> 
> '''let's continue to
>  let's continue, yeah
>  .....(and here goes many other text)...''',
> 
> 
> '''I am using python
> I am using perl
>  .....(and here goes many other text)...'''
> ]

Perhaps it would be better if you show a more realistic example.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/



More information about the Python-list mailing list