Which non SQL Database ?

Sun Jan 23 11:12:35 EST 2011

In article <pan.2011.01.23.06.09.16 at pfln.invalid>,
 Deadly Dirk <dirk at pfln.invalid> wrote:

> The same thing applies to MongoDB which is equally fast but does allow ad 
> hoc queries and has quite a few options how to do them. It allows you to 
> do the same kind of querying as RDBMS software, with the exception of 
> joins. No joins.

Well, sort of.  You can use forEach() to get some join-like 
functionality.  You don't get the full join optimization that SQL gives 
you, but at least you get to do some processing on the server side so 
you don't have to ship 40 gazillion records over the network to pick the 
three you wanted.

> It also allows map/reduce queries using JavaScript and 
> is not completely schema free.

What do you mean by "not completely schema free"?

> Databases have sub-objects called "collections" which can be indexed 
> or partitioned across several machines ("sharding"), which is an 
> excellent thing for building shared-nothing clusters. 

We've been running Mongo 1.6.x for a few months.  Based on our 
experiences, I'd say sharding is definitely not ready for prime time.  
There's two issues; stability and architecture.

First, stability.  We see mongos (the sharding proxy) crash a couple of 
times a week.  We finally got the site stabilized by rigging upstart to 
monitor and automatically restart mongos when it crashes.  Fortunately, 
mongos crashing doesn't cause any data loss (at least not that we've 
noticed).  Hopefully this is something the 10gen folks will sort out in 
the 1.8 release.

The architectural issues are more complex.  Mongo can enforce uniqueness 
on a field, but only on non-sharded collection.  Security (i.e. password 
authentication) does not work in a sharded environment.  If I understand 
the release notes correctly, that's something which may get fixed in 
some future release.

> Scripting languages like Python are 
> very well supported and linked against MongoDB

The Python interface is very nice.  In some ways, the JS interface is 
nicer, only because you can get away with less quoting, i.e.

JS:   find({inquisition: {$ne: 'spanish'}}
Py:   find({'inquisition': {'$ne': 'spanish'}}

The PHP interface is (like everything in PHP), sucky:

PHP:  find(array('inquisition' => array('$ne' => 'spanish'))

The common thread here is that unlike SQL, you're not feeding the 
database a string which it parses, you're feeding it a data structure.  
You're stuck with whatever data structure syntax the host language 
supports.  Well, actually, that's not true.  If you wanted to, you could 
write a front end which lets you execute:

      "find where inquisition != spanish"

and have code to parse that and turn it into the required data 
structure.  The odds of anybody doing that are pretty low, however.  It 
would just feel wrong.  In much the same way that SQLAlchemy's 
functional approach to building a SQL query just feels wrong to somebody 
who knows SQL.

> I find MongoDB well suited for what is 
> traditionally known as data warehousing.

I'll go along with that.  It's a way to build a fast (possibly 
distributed, if they get sharding to work right) network datastore with 
some basic query capability.  Compared to SQL, you end up doing a lot 
more work on the application side, and take on a lot more of the 
responsibility to enforce data integrity yourself.

> You may want to look 
> at this Youtube clip entitled "MongoDB is web scale":
> 
> http://www.youtube.com/watch?v=b2F-DItXtZs

That's the funniest thing I've seen in a long time.  The only sad part 
is that it's all true.

There are some nice things to NO-SQL databases (particularly the 
schema-free part).  A while ago, we discovered that about 200 of the 
300,000 documents in one of our collections were effectively duplicates 
of other documents ("document" in mongo-speak means "record" or perhaps 
"row" in SQL-speak).  It was trivial to add "is_dup_of" fields to just 
those 200 records, and a little bit of code in our application to check 
the retrieved documents for that field and retrieve the pointed-to 
document.  In SQL, that would have meant adding another column, or 
perhaps another table.  Either way would have been far more painful than 
the fix we were able to do in mongo.