[Tutor] table to dictionary and then analysis

Russel Winder russel at winder.org.uk
Thu May 17 09:39:59 CEST 2012


On Wed, 2012-05-16 at 16:03 +0100, Alan Gauld wrote:
[...]
> I agree, but in this case SQL seemed like the most likely fit of the 
> ones I knew. however:

Which raises the point that the best design of a given problem in a
given context is the one that is most comprehensible to the people
directly involved.

> >          SQL
> >          MongoDB
> 
> I know about these

I have it on good authority yesterday that MongDB is only properly
useful in a single store context, i.e. not a replicated cluster.  Along
withthis comes news that Riak is very good and has a Python API.

> 
> >          CouchDB
> >          Cassandra
> >          Neo
> 
> These are new to me.

CouchDB is an Erlang implemented system. Ubuntu One uses this for
example.  Cassandra is an Apache project, a JVM-based system. MongoDB,
CouchDB and Cassandra are "document stores". Neo4J is a graph repository
so of a very different architecture and performance characteristics. And
then there is Redis :-)

> 
> > etc. Python only has SQLite3 as standard but there are alternatives. I
> > have been using PyMongo quite successfully.
> 
> Python comes with several storage/access options including shelve, gdbm, 
> ldap, cobfig files, XML, in addition to SQL.

Indeed. The problem I have had with shelve for this sort of thing is
that is is critically dependent on the pickling algorithm and so
potentially Python version dependent.

[...]
> on flexiblity of data format. The OPs requirements suggested intelligent 
> filtering of a fixed record format which is one of the areas where SQL 
> works well. The other side of the coin is that the data is essentially 
> single table so the relationship management aspects of SQL would not be 
> needed. So I agree we don't have enough detail
> to be 100% sure that another option would not work as well or better.

The signpost here is that the table as is is likely not in third normal
form, and that if the problem currently being solved was actually a
small part of a bigger problem, this issue would need to be addressed.

> But most other options require learning new (often bespoke) query 
> languages and have limited user tools. All of these factors need to be 
> included too. Mongo et al tend to be better suited, in my experience, to 
> machine access applications rather than end user access.

Agreed. Unfortunately, vendor commercial issues often get in the way of
experimenting to find out where the NoSQL systems are genuinely better
than SQL ones. We will get there though.

Interesting, or not, the "Big Data" people are rapidly realizing that
data mining and SQL are mutually incompatible. The current trend is
towards streaming whole databases through dataflow programs. But Java
rather than Python is the default language in that world.

> > There are various articles around the Web comparing and contrasting
> > these various models. Some of the articles are even reasonable :-)
> 
> Wikipedia is my friend :-)

:-)

-- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder at ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel at winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/tutor/attachments/20120517/c0f9d479/attachment.pgp>


More information about the Tutor mailing list