[Tutor] numerical simulation + SQLite

Tue Dec 1 17:48:34 CET 2009

Thanks everyone for your responses!

Alan Gauld wrote:
> You may need to be realistic in your expectations.
> A database is writing to disk which will be slower than working in memory. And a 3GB file takes a while to read/traverse, even with indexes. It depends a lot on exactly what you are doing. If its mainly writing it should not be much slower than writing to a flat file. If you are doing a lot of reading - and you have used indexes - then it should be a lot faster than a file.
> 
> But RAM - if you have enough - will always be fastest, by about 100 times.
> The problem is when you run out, you revert to using files and that's usually slower than a database...
> 
> But without details of your usage pattern and database schema and SQL code etc it is, as you say, impossible to be specific.

I'm running a stochastic simulation of Brownian motion of a number of particles, for which I'll present a simplified version here. At each time step, I determine if some particles have left the system, determine the next position of the remaining particles, and then introduce new particles into the system at defined starting points. I have two tables in my SQLite database: one for information on each particle and one for all the x y z locations for each particle.

sqlite> .schema Particles
CREATE TABLE Particles
        (part_id INTEGER PRIMARY KEY,
        origin INTEGER,
        endpoint INTEGER,
        status TEXT,
        starttime REAL,
        x REAL,
        y REAL,
        z REAL);

sqlite> .schema Locations
CREATE TABLE Locations
        (id INTEGER PRIMARY KEY AUTOINCREMENT,
        timepoint REAL,
        part_id INTEGER,
        x REAL,
        y REAL,
        z REAL);

For particles that have left the system, I create a list of part_id values whose status I'd like to update in the database and issue a command within my script (for which db=sqlite3.connect('results.db')):

db.executemany("UPDATE Particles SET status='left' WHERE part_id=?",part_id)
db.commit()

To update the position, something like:

db.executemany("UPDATE Particles SET x=?,y=?,z=? WHERE part_id=?",Particle_entries)
db.executemany("INSERT INTO Locations (timepoint,lig,x,y,z) VALUES (?,?,?,?,?)",Location_entries)
db.commit()

That's about it. Just for many particles (i.e. 1e4 to 1e5). I'm considering whether I need every location entry or if I could get away with every 10 location entries, for example.

Eike Welk wrote:
> Just in case you don't know it, maybe Pytables is the right solution 
> for you. It is a disk storage library specially for scientific 
> applications:
> http://www.pytables.org/moin

Wow, that looks pretty good. I work with a lot of numpy.array's in this simulation so I'll definitely look into that.

bob gailer wrote:
> What do you do with the results after the simulation run?
> 
> How precise do the numbers have to be?

I'm interested in the particles that have left the system (I actually have a few ways they can leave) and I'm also interested in the ensemble average of the trajectories. As far as precision is concerned, I'm working on the scale of µm and each movement is on the order of 0.1 to 10 µm.

Faisal