[XML-SIG] Using PyExpat.py

Guido van Rossum guido@digicool.com
Mon, 19 Feb 2001 16:49:20 -0500


> > > xml_dom_object = reader.fromUri(filename) #should work for either
> > > URL or file
> 
> > Let's talk about this comment.  Is it really a good idea to build URL
> > access right into the API here?
> 
> I can't find out whether this has been settled. Did you propose to
> drop the support for URLs in the API, or the one for local files.

I'd like to drop support for URLs; I don't think the typical computer
is sufficiently networked to make this work well.

> We just had a report where urllib apparently decided to use "c" as the
> protocol name; I'm not entirely sure what the exact cause was.

That's the ambiguity between local filenames and URLs.  You have to
decide whether filenames passed to APIs are in local filename space or
in URL space, and not try to guess based on what the name looks like.
On the Mac, all absolute filenames look like foo:bar or
foo:bar:bletch, so there you have even less to work with.

> > Case in point: I found this bit in saxutilx.py:
> > 
> >         if os.path.isfile(sysid):
> >             basehead = os.path.split(os.path.normpath(base))[0]
> >             source.setSystemId(os.path.join(basehead, sysid))
> >             f = open(sysid, "rb")
> >         else:
> >             source.setSystemId(urlparse.urljoin(base, sysid))
> >             f = urllib.urlopen(source.getSystemId())
> > 
> > Now I don't know under which circumstances this get triggered (the
> > context is obscure)
> 
> prepare_input_source is invoked by every parser when processing the
> argument to .parse(), so the common usage is
> 
>   p = make_parser()
>   p.setContentHandler(something)
>   p.parse(filename)
> 
> Instead of filename, you can have URLs, stream, and InputSource
> objects (the Java API only supports InputSource here).

I would suggest to have separate APIs depending on the argument type,
e.g. p.parseFile(filename), p.parseURL(url),
p.parseStream(InputSource), p.parseString(text).  (And no, Java
overloading wouldn't help much here, since three out of four APIs have
string arguments.)

> > but I'd say it's a bad idea to just try to open a URL when a string
> > isn't a local file.  Maybe *you* live in a world where the network
> > is "always on" (and I do too!), but for plenty of folks, it's rather
> > annoying to find that their modem starts dialing out each time they
> > make a typo in a filename.
> 
> But would the modem actually start dialling? Wouldn't it rather
> determine that the protocol is "file" and the report that the file is
> missing? So I think it would either report an unknown url type, or an
> ENOENT. What kind of typo did you think of?

Maybe I was thinking of another case (not involving PyXML) that was
reported to me third hand, where a filename containing a colon on
Windows (using Cygwin tools) ended up being interpreted as Unix rcp
filename syntax, and the system was doing a host lookup on the part
before the colon -- that really does make the modem dial!

> > The application knows this, but the library doesn't.  It's also fine
> > to have an alternative API that takes a URL instead of a local
> > filename -- but it's not okay to attempt to overlap the two
> > namespaces.
> 
> The application can always make sure that the right thing is processed
> by opening it itself, and then passing that to the parser.

Sure, and if a string is given, it should be assumed to be a local
filename unless the API name has "URL" in it.

--Guido van Rossum (home page: http://www.python.org/~guido/)