xml.dom - reading from a file

Bengt Richter bokr at oz.net
Mon Nov 17 12:49:03 EST 2003


On Mon, 17 Nov 2003 09:54:37 GMT, Alex Martelli <aleax at aleax.it> wrote:

>sashan wrote:
>
>> Is the way to use DOM for an xml file as follows:
>> 1) Read the file into a string
>> 2) Call xml.dom.minidom.parseString(string)
>
>It's one way, but xml.dom.minidom.parse(f) is generally better.  f can
>be a filename OR a file object open for reading.
>
That reminds me ...

Is there a BDFL pronouncement or dev consensus on implementation of accepting
either filename or file-object?

E.g., should one
    assert type(filename) is str 
or
    assert isinstance(filename,str)
or
    ??

and is the file object alternative

    assert isinstance(filename, file)  # too restrictive IMO
or
    assert hasattr(filename,'read') and callable(filename.read)  # what about next?
or
    ??

I guess the generic idea is that filename-when-it-is-a-file-object will be bound to
something that produces a sequence of strings, so shouldn't an iterator/generator
be acceptable as well? (E.g., I expect generator expressions will be handy for
test inputs etc.)

So should one look for a next method?

And, given a generic source of string chunks (must they be str instances or could they
be generator chunks recursively?) is there a blessed efficient wrapper function that will
convert the str chunk stream to an object that can fake a file instance more completely
(e.g., for readline etc.)? 

In fact, why not a standard function to convert this kind of either-or argument into
a file instance proxy? Then policy and behavior could be standardized, and people
wouldn't be wondering and re-inventing wheel variants.

Let's see...

 >>> vars(file).keys()
 ['softspace', 'encoding', 'xreadlines', 'readlines', 'flush', 'close', 'seek', '__init__', 'newl
 ines', '__setattr__', '__new__', 'readinto', 'next', 'write', 'closed', 'tell', 'mode', 'isatty'
 , 'truncate', 'read', '__getattribute__', '__iter__', 'readline', 'fileno', 'writelines', 'name'
 , '__doc__', '__delattr__', '__repr__']

Well, probably not all that (except maybe for nice error messages) and maybe the wrapping function
should accept some keyword arguments for 'strict' vs 'warn' and maybe optional callback vs exception
raising?

Oh, and what about when the arg is already a standard file instance? Should e.g., mode be
overridable when that is feasible?

In summary, the proposed goal is to make usage a no-brainer by providing a standard wrapping function
for file-or-filename args.

Regards,
Bengt Richter




More information about the Python-list mailing list