PyYaml?

Chris S. chrisks at NOSPAM.udel.edu
Sun Sep 19 19:23:18 EDT 2004


Paul Moore wrote:

> "Hostile" seems a little exaggerated. The original posting (quoted
> above) asked the question "Is there any benefit to Pickle over YAML?"
> I suppose that a reasonable answer (from me) might be "not that I
> know of", but that begs the question, as I know very little of YAML.
> 
> Maybe the original poster (or some other supporter of YAML) could
> provide some reasons to think that YAML *might* be superior to
> Pickle. Then the people who know about Pickle could respond more
> helpfully.
> 
> For example, you (Chris S) claim that YAML is "more secure, portable,
> and readable". OK, let's take these in turn:
> 
> More secure - as others have pointed out, Pickle allows pickling and
> unpickling of class instances, and class code can do what it likes in
> the constructor (I oversimplify here, as I don't know the details well
> myself). Sure, this is a security issue, but it's an inherent
> insecurity in the feature, and not limited to Pickle. If YAML
> implemented the same feature, it would have the same issues to
> resolve. Improving security by removing features isn't a clear win for
> YAML (note thet I am not saying that security in exchange for reduced
> features might not be a good tradeoff in some cases - I'm addressing
> the "replace Pickle with YAML" suggestion, not a suggestion that we
> have both).

I don't quite follow your logic. If you load a serialized file, you 
should conceivably already know what classes it should and should not be 
instantiating, and be able to restrict its access accordingly. Of 
course, a file could still be altered to wreak havoc within the confines 
of the set limitations, but I'm under the impression that Pickle allows 
execution of arbitrary code, regardless of the classes being 
instantiated. Please correct me if I'm wrong.

> More portable - hmm, OK. I'm not sure where you want portability
> *between*, though. Pickle is, as far as I know, portable across
> platforms. Are you talking about portability between languages? I
> can't think where I'd want to dump a Python object for loading into
> Perl or Ruby, though. Can you offer me some real-life use cases?

I meant language and platform portability. I suppose you'd find this 
aspect attractive for the same reasons you'd use XML, which some have 
also used as a serialization format. Granted, not every languages' 
objects may be translatable, but many languages share common data 
primitives.

> More readable - I'll give you this. And yes, it can be useful. I've
> been stuffed before now with Java programs whose configuration is
> stored as a serialised-to-disk object which is completely opaque to
> external tools, let alone human readers. But this is a property that
> is useful only in case of failure (if the config gets stuffed, I can
> hand-hack the dump file, or if I forget what I set parameter X to, I
> can look in the dump). If the application design *requires* the dump
> format to be readable, we've moved away from serialisation, and
> started to talk about configuration formats (which is a separate
> issue, one in which it is quite possible that YAML is strong, but
> *not* one in which it is competing with Pickle).

[snip]

> None of this is a criticism of YAML and/or its libraries themselves.
> However, it does make any suggestion that YAML be used to replace a
> key part of the Python standard library seem a little premature, at
> least.
 >
> I hope this response didn't come across as hostile - I certainly
> don't intend it that way. But I do believe that it is the
> responsibility of those making the suggestion that YAML replace
> pickle to come up with decent arguments. (Or a robust, tested,
> documented patch for the Python core, of course - that avoids the
> impression that the requester is hoping that someone else will do the
> work for him :-))

Fair enough. I didn't mean to imply that the current YAML 
implementations were drop-in replacements for Pickle, only that the 
concept of YAML deserves more attention.

> I'd like to see a strong (this includes "well-documented"!! :-)) YAML
> library for Python, if only so I could try it out and find out what
> YAML *is* good for, in my environment. In theory, I like YAML - it's
> just the practicalities that elude me.
> 
> [Later]
> I just re-read some of the YAML website. It appears clear from there
> that YAML is designed as a serialisation format. But there seems to
> be a lack of justification as to *why* the design goals (section 1.1
> of the spec) are important. Also, security is *not* an explicit goal,
> and section 3.1.6 (the "Construct" process) is completely lacking in
> any discussion of the security or other implications of converting a
> YAML file to a native language object. This seems somewhat surprising
> in a specification for a serialisation format...

Well, if the concept of serialization is indeed inherently insecure, 
what could they possibly do? In order for YAML to directly address 
security, it would have to concern itself with the "meaning" of the data 
being serialized, which seems outside the scope of YAML's purpose. 
Serialization security seems generally assigned as a responsibility of 
the user, who is usually in the best position to gage their data's 
effects. The best a serialization format can do is ensure data 
reconstruction within the bounds described by the user.



More information about the Python-list mailing list