PyYaml?

Clark C. Evans cce at clarkevans.com
Mon Sep 20 17:01:12 EDT 2004


On Sun, Sep 19, 2004 at 02:53:22PM +0100, Paul Moore wrote:
| It seems to claim to be different things at different times - a
| serialization format, a config file format, a replacement for XML

At conception, I wanted a text format for invoices and other
transactional business documents that was: (a) very human readable,
(b) loaded into native data structures without requiring a DOM or a
bunch of parser-hand-holding, (c) had a simple enough information
model that a schema and transformation language would not be a
serious exercise in topology.  Brian Ingerson, one of the other
co-authors was working on something similar to Pickle for Perl.

| At the time, I was looking for a config format, and it wasn't
| *quite* what I wanted, because some of the serialization and XML 
| aspects made it slightly clumsy as a config format.

That some people use it for configuration files is due to Brian's
influence on the more-than-one-way-to-write-it.  Also, our earlier
goals of a cross-language serialization tool got in the way of
making it a great configuration file language.  We've since had to
make some compromises in this regard. Two other good uses for YAML
include log files and tests suites. Neither of which were the
initial focus, but alas, some things get a life of their own.

| I suspect that people who want to use YAML for serialization,
| or as an XML replacement, may feel the same way. And yet, I don't get
| the feeling that YAML is being developed as a "compromise" format, so
| I am obviously missing a key design principle.

I work with business documents all the time; especially ones that
move between computer systems using different programming languages.
So, this was my primary goal; we advertise YAML as a serialization
language since this is the 'easiest category' to put ourselves in.

| As regards the existing YAML libraries for Python, when I looked I
| found that the PyYAML website claimed that it was out of date with
| respect to the latest spec. I also tried SYCK, which looks OK, but
| which I did manage to provoke a crash from without trying too hard.

Er ya.  Don't do "syck.parse", I need to remove that function from
the public interface.  The newest release of Syck is far more stable
so you may want to try it again.

| None of this is a criticism of YAML and/or its libraries themselves.
| However, it does make any suggestion that YAML be used to replace a
| key part of the Python standard library seem a little premature, at
| least.

Definitely.  YAML has at least two more years of work before it'd be 
ready for even proposing that it be considered as a core library.

| I just re-read some of the YAML website. It appears clear from there
| that YAML is designed as a serialization format. But there seems to
| be a lack of justification as to *why* the design goals (section 1.1
| of the spec) are important. Also, security is *not* an explicit goal,
| and section 3.1.6 (the "Construct" process) is completely lacking in
| any discussion of the security or other implications of converting a
| YAML file to a native language object. This seems somewhat surprising
| in a specification for a serialization format...

*nods*  I hope the discussion above helps.  I doubt that YAML would
ever be a good 'drop-in' replacement for pickle.  If in the far-distant
future someone were to propose using YAML in this way, it'd probably be
one of N 'formats' for a more pluggable pickle module.  

| More portable - hmm, OK. I'm not sure where you want portability
| *between*, though. Pickle is, as far as I know, portable across
| platforms. Are you talking about portability between languages? I
| can't think where I'd want to dump a Python object for loading into
| Perl or Ruby, though. Can you offer me some real-life use cases?

Certainly.  I work with several programmers in different shops,
we move transactional documents around, traditionally with XML,
but more so with YAML.  By next year this time I hope it is all
YAML.  If you are just using hash/list/scalar data types (90%
of our use cases) then YAML is a great option.  In fact, recently
we had a customer start using the Perl version of YAML with our
system and it worked.

| More readable - I'll give you this. And yes, it can be useful. I've
| been stuffed before now with Java programs whose configuration is
| stored as a serialized-to-disk object which is completely opaque to
| external tools, let alone human readers. But this is a property that
| is useful only in case of failure (if the config gets stuffed, I can
| hand-hack the dump file, or if I forget what I set parameter X to, I
| can look in the dump). If the application design *requires* the dump
| format to be readable, we've moved away from serialization, and
| started to talk about configuration formats (which is a separate
| issue, one in which it is quite possible that YAML is strong, but
| *not* one in which it is competing with Pickle).

Exactly.  The older PyYaml made configuration files painful, as it
was trying to implicitly type all kinda of data (recognizing floating
points, dates, etc.).  We found this behavior to be a bit
counter-productive for config files, and hence this "implicit
typing" is now strictly optional, application directed behavior.

Best,

Clark



More information about the Python-list mailing list