delete from pattern to pattern if it contains match

Mon Apr 25 02:29:00 EDT 2016

On Friday, April 22, 2016 at 4:41:08 PM UTC+5:30, Jussi Piitulainen wrote:
> Peter Otten writes:
> 
> > harirammanohar at gmail.com wrote:
> >
> >> On Thursday, April 21, 2016 at 7:03:00 PM UTC+5:30, Jussi Piitulainen
> >> wrote:
> >>> harirammanohar at gmail.com writes:
> >>> 
> >>> > On Monday, April 18, 2016 at 12:38:03 PM UTC+5:30,
> >>> > hariram... at gmail.com wrote:
> >>> >> HI All,
> >>> >> 
> >>> >> can you help me out in doing below.
> >>> >> 
> >>> >> file:
> >>> >> <start>
> >>> >>  guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  mango
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  orange
> >>> >> fruit
> >>> >> <end>
> >>> >> 
> >>> >> need to delete from start to end if it contains mango in a file...
> >>> >> 
> >>> >> output should be:
> >>> >> 
> >>> >> <start>
> >>> >>  guava
> >>> >> fruit
> >>> >> <end>
> >>> >> <start>
> >>> >>  orange
> >>> >> fruit
> >>> >> <end>
> >>> >> 
> >>> >> Thank you
> >>> >
> >>> > any one can guide me ? why xml tree parsing is not working if i have
> >>> > root.tag and root.attrib as mentioned in earlier post...
> >>> 
> >>> Assuming the real consists of lines between a start marker and end
> >>> marker, a winning plan is to collect a group of lines, deal with it, and
> >>> move on.
> >>> 
> >>> The following code implements something close to the plan. You need to
> >>> adapt it a bit to have your own source of lines and to restore the end
> >>> marker in the output and to account for your real use case and for
> >>> differences in taste and judgment. - The plan is as described above, but
> >>> there are many ways to implement it.
> >>> 
> >>> from io import StringIO
> >>> 
> >>> text = '''\
> >>> <start>
> >>>   guava
> >>> fruit
> >>> <end>
> >>> <start>
> >>>   mango
> >>> fruit
> >>> <end>
> >>> <start>
> >>>   orange
> >>> fruit
> >>> <end>
> >>> '''
> >>> 
> >>> def records(source):
> >>>     current = []
> >>>     for line in source:
> >>>         if line.startswith('<end>'):
> >>>             yield current
> >>>             current = []
> >>>         else:
> >>>             current.append(line)
> >>> 
> >>> def hasmango(record):
> >>>     return any('mango' in it for it in record)
> >>> 
> >>> for record in records(StringIO(text)):
> >>>     hasmango(record) or print(*record)
> >> 
> >> Hi,
> >> 
> >> not working....this is the output i am getting...
> >> 
> >> \
> >
> > This means that the line
> >
> >>> text = '''\
> >
> > has trailing whitespace in your copy of the script.
> 
> That's a nuisance. I wish otherwise undefined escape sequences in
> strings raised an error, similar to a stray space after a line
> continuation character.
> 
> >>  <start>
> >>    guava
> >>  fruit
> >> 
> >> <start>
> >>    orange
> >>  fruit
> >
> > Jussi forgot to add the "<end>..." line to the group.
> 
> I didn't forget. I meant what I said when I said the OP needs to adapt
> the code to (among other things) restore the end marker in the output.
> If they can't be bothered to do anything at all, it's their problem.
> 
> It was already known that this is not the actual format of the data.
> 
> > To fix this change the generator to
> >
> > def records(source):
> >     current = []
> >     for line in source:
> >         current.append(line)
> >         if line.startswith('<end>'):
> >             yield current
> >             current = []
> 
> Oops, I notice that I forgot to start a new record only on encountering
> a '<start>' line. That should probably be done, unless the format is
> intended to be exactly a sequence of "<start>\n- -\n<end>\n".
> 
> >>>     hasmango(record) or print(*record)
> >
> > The
> >
> > print(*record)
> >
> > inserts spaces between record entries (i. e. at the beginning of all
> > lines except the first) and adds a trailing newline.
> 
> Yes, I forgot about the space. Sorry about that.
> 
> The final newline was intentional. Perhaps I should have added the end
> marker there instead (given my preference to not drag it together with
> the data lines), like so:
> 
>    print(*record, sep = "", end = "<end>\n")
> 
> Or so:
> 
>    print(*record, sep = "")
>    print("<end>")
> 
> Or so:
> 
>    for line in record:
>        print(line.rstrip("\n")
>    else:
>        print("<end>")
> 
> Or:
> 
>    for line in record:
>        print(line.rstrip("\n")
>    else:
>        if record and not record[-1].strip() == "<end>":
>            print("<end>")
> 
> But all this is beside the point that to deal with the stated problem
> one might want to obtain access to a whole record *first*, then check if
> it contains "mango" in the intended way (details missing but at least
> "mango\n" as a full line counts as an occurrence), and only *then* print
> the whole record (if it doesn't contain "mango").
> 
> I can think of two other ways - one if the data can be accessed only
> once - but they seem more complicated to me. Hm, well, if it's XML, as
> stated in another branch of this thread and contrary to the form of the
> example data in this branch, there's a third way that may be good, but
> here I'm responding to a line-oriented format.
> 
> > You can avoid this by specifying the delimiters explicitly:
> >
> > if not hasmango(record):
> >     print(*record, sep="", end="")
> >
> > Even with these changes code still looks somewhat brittle...
> 
> That depends on the actual data format, and on what really is intended
> to trigger the filter. This approach is a complete waste of effort if
> there are no guarantees of things being there on their own lines, for
> example.
> 
> Ok, that "\ " not only looks brittle but actually is brittle. The one
> time I used that slash, I now regret doing so. Here's a fixed version.
> (Not sure of the significance of the number of spaces that start the
> first data line. They seem to have doubled along the way.)
> 
> text = '''<start>
>   guava
> fruit
> <end>
> <start>
>   mango
> fruit
> <end>
> <start>
>   orange
> fruit
> <end>
> '''

Hi Jussi,

i have seen you have written a definition to fulfill the requirement, can we do this same thing using xml parser, as i have failed to implement the thing using xml parser of python if the file is having the content as below...

<!DOCTYPE web-app 
    PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
    "http://java.sun.com/dtd/web-app_2_3.dtd">

<web-app>

and entire thing works if it has as below:
<!DOCTYPE web-app 
<web-app>

what i observe is xml tree parsing is not working if http tags are there in between web-app...