[BangPypers] Handling unicode characters in xml.dom

Wed Mar 19 10:01:08 CET 2008

The XML is:

<?xml version="1.0" encoding="UTF-8"?>
<results>
<suite success="2" failures="0" name="Test 1">
  <case status="PASSED" time="25.21 sec" >
    <system-out><![CDATA[Test Program Metadata class.
      Metadata Key 'duration;valtype=uint32;timescale=90000', value 21762000
      Metadata Key 'date;valtype=wchar*', value '20030414T044847.000Z'
      Metadata Key
'description;valtype=wchar*;iso-639-2-lang=jpn;orig-char-enc=UTF-8', value
'f(3gpÕ¡¤ë'
      Metadata Key
'title;valtype=wchar*;iso-639-2-lang=jpn;orig-char-enc=UTF-8', value
'pa_st_2000'
      Metadata Key
'author;valtype=wchar*;iso-639-2-lang=jpn;orig-char-enc=UTF-8', value
'É³â_BBBBBBBBBBBBBBBBBBBBBBBBBBB'
      Metadata Key 'track-inf;valtype=char*;index=0', value 'ES'

    ]]></system-out>
  </case>
</suite></results>

Thanks,
Gurpreet

On Tue, Mar 18, 2008 at 1:22 PM, Anand Balachandran Pillai <
abpillai at gmail.com> wrote:

> Is the garbage CDATA or attribute data ?
>
> CDATA is like <elem>text</elem> and attribute
> is <elem attr="value" />
>
> Can you pase the relevant part of the XML file here or if it is
> small enough, the complete XML file ? Send it directly to me
> since the list removes attachments.
>
> --Anand
>
> On Tue, Mar 18, 2008 at 11:05 AM, Gurpreet Sachdeva
> <gurpreet.sachdeva at gmail.com> wrote:
> > <?xml version="1.0" encoding="UTF-8"?>
> >
> > Still the problem exists.
> >
> > - Gurpreet
> >
> >
> >
> > On Tue, Mar 18, 2008 at 10:44 AM, Anand Balachandran Pillai
> > <abpillai at gmail.com> wrote:
> >
> > > What is the encoding of your XML file ? i.e in the
> > > string "<?xml version="1.0" encoding="<encoding>"?>,
> > > what is <encoding> ?
> > >
> > > Make sure it is an encoding like utf-8 or iso-8859-1
> > > which can help the parser to understand garbage
> > > chars.
> > >
> > > --Anand
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Mar 18, 2008 at 10:38 AM, Gurpreet Sachdeva
> > > <gurpreet.sachdeva at gmail.com> wrote:
> > > > Hi,
> > > >
> > > > Any idea how to handle the unicode characters existing in an xml
> file
> > while
> > > > parsing it.
> > > >
> > > > This is what I am doing:
> > > >
> > > > from xml.dom import minidom
> > > >
> > > > xmlObj = minidom.parse(fileobj)
> > > >
> > > > And the script throws an error because of some special characters
> ['f
> > > > (3gpÕ¡¤ë'] present in the xml file. Any suggestion/pointers would be
> > > > appreciated
> > > >
> > > > Thanks and Regards,
> > > > Gurpreet Singh
> > > > _______________________________________________
> > > >  BangPypers mailing list
> > > >  BangPypers at python.org
> > > >  http://mail.python.org/mailman/listinfo/bangpypers
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -Anand
> > > _______________________________________________
> > > BangPypers mailing list
> > > BangPypers at python.org
> > > http://mail.python.org/mailman/listinfo/bangpypers
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Gurpreet Singh
> > _______________________________________________
> >  BangPypers mailing list
> >  BangPypers at python.org
> >  http://mail.python.org/mailman/listinfo/bangpypers
> >
> >
>
>
>
> --
> -Anand
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>

-- 
Thanks and Regards,
Gurpreet Singh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/bangpypers/attachments/20080319/b9efecc7/attachment.htm