xml processing : too slow...

Shagshag13 shagshag13 at yahoo.fr
Thu Jul 25 10:44:09 EDT 2002


> p.parse('<fict>%s</fict>' % line, 1)
>
> should be satisfactory for checking this kind of "sort of
> well-formedness", unless there are yet more specs as yet
> unexpressed.

that's why i had done :
>>> anotherline = '<root>' + line + '</root>'
>>> p.Parse(anotherline, 1)
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in ?
    p.Parse(anotherline, 1)
ExpatError: junk after document element: line 1, column 0

but it still don't work, as much has:

>>> p.Parse('<fict>%s</fict>' % line, 1)
Traceback (most recent call last):
  File "<pyshell#185>", line 1, in ?
    p.Parse('<fict>%s</fict>' % line, 1)
ExpatError: junk after document element: line 1, column 0

> How would that help you diagnosed e.g.
>         <bah thisis=notvalid>of course not</bah>
> as not being well formed?  This is not well formed because
> it lacks quotes around an attribute's value.  Or:
>         <bah thisis="notvalid">&either</bah>
> now THIS is not well formed because reference '&either'
> is not terminated with a semicolon.  Etc, etc.

that's right i didn't address this kind of thing... :(

> But _we_ can't know, unless you DO tell us the specs.

that's was the meaning of my example : anything among words, numbers, punctuations and some symbols like $.
space are separators. don't have entities like &entities; or consider it like a simple word.

thanks,

s13.







More information about the Python-list mailing list