delete from pattern to pattern if it contains match

Jussi Piitulainen jussi.piitulainen at helsinki.fi
Mon Apr 25 06:24:48 EDT 2016


harirammanohar at gmail.com writes:

> On Monday, April 25, 2016 at 12:47:14 PM UTC+5:30, Jussi Piitulainen wrote:
>> harirammanohar at gmail.com writes:
>> 
>> > Hi Jussi,
>> >
>> > i have seen you have written a definition to fulfill the requirement,
>> > can we do this same thing using xml parser, as i have failed to
>> > implement the thing using xml parser of python if the file is having
>> > the content as below...
>> >
>> > <!DOCTYPE web-app 
>> >     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" 
>> >     "http://java.sun.com/dtd/web-app_2_3.dtd">
>> >
>> > <web-app>
>> >
>> > and entire thing works if it has as below:
>> > <!DOCTYPE web-app 
>> > <web-app>
>> >
>> > what i observe is xml tree parsing is not working if http tags are
>> > there in between web-app...
>> 
>> Do you get an error message?
>> 
>> My guess is that the parser needs the DTD but cannot access it. There
>> appears to be a DTD at that address, http://java.sun.com/... (it
>> redirects to Oracle, who bought Sun a while ago), but something might
>> prevent the parser from accessing it by default. If so, the details
>> depend on what parser you are trying to use. It may be possible to save
>> that DTD as a local file and point the parser to that.
>> 
>> Your problem is morphing rather wildly. A previous version had namespace
>> declarations but no DTD or XSD if I remember right. The initial version
>> wasn't XML at all.
>> 
>> If you post (1) an actual, minimal document, (2) the actual Python
>> commands that fail to parse it, and (3) the error message you get,
>> someone will be able to help you. The content of the document need not
>> be more than "hello, world" level. The DOCTYPE declaration and the
>> outermost tags with all their attributes and namespace declarations, if
>> any, are important.
>
> Hi Jussi,
>
> Here is an input file...sample.xml
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>   xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
>                       http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
>   version="3.1">
>     <servlet>
>       <servlet-name>controller</servlet-name>
>       <servlet-class>com.mycompany.mypackage.ControllerServlet</servlet-class>
>       <init-param>
>         <param-name>listOrders</param-name>
>         <param-value>com.mycompany.myactions.ListOrdersAction</param-value>
>       </init-param>
>       <init-param>
>         <param-name>saveCustomer</param-name>
>         <param-value>com.mycompany.myactions.SaveCustomerAction</param-value>
>       </init-param>
>       <load-on-startup>5</load-on-startup>
>     </servlet>
>
>
>     <servlet-mapping>
>       <servlet-name>graph</servlet-name>
>       <url-pattern>/graph</url-pattern>
>     </servlet-mapping>
>
>
>     <session-config>
>       <session-timeout>30</session-timeout>
>     </session-config>
> </web-app>
>
> --------------------------------
> Here is the code:
>
> import xml.etree.ElementTree as ET
> ET.register_namespace("", "http://xmlns.jcp.org/xml/ns/javaee")
> tree = ET.parse('sample.xml')
> root = tree.getroot()
>
> for servlet in root.findall('servlet'):
>         servletname = servlet.find('servlet-name').text
>         if servletname == "controller":
>                 root.remove(servlet)
>
> tree.write('output.xml')
>
> This will work if <web-app> </web-app> doesnt have below...
>
> xmlns="http://xmlns.jcp.org/xml/ns/javaee"
>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>   xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee
>                       http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"

It's a namespace issue, and your method of registering a default
namespace isn't working. It's a frustrating failure mode: no error
message, no nothing :)

Try defining a namespace prefix in your method calls, and using that
prefix in element names:

ns = { 'x' : "http://xmlns.jcp.org/xml/ns/javaee" }

for servlet in root.findall('x:servlet', ns):
    servletname = servlet.find('x:servlet-name', ns).text

I got this from here:
https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces

Note that the namespace prefix - I chose to use 'x' - has no meaning.
It's the association of the prefix that you use to the URI that is the
name of the namespace that does the job.



More information about the Python-list mailing list