Fast lookup of bulky "table"

Peter J. Holzer hjp-python at hjp.at
Sun Jan 15 14:39:14 EST 2023


On 2023-01-15 10:38:22 -0500, Thomas Passin wrote:
> On 1/15/2023 6:14 AM, Peter J. Holzer wrote:
> > On 2023-01-14 23:26:27 -0500, Dino wrote:
> > > Anyway, my Flask service initializes by loading a big "table" of 100k rows
> > > and 40 columns or so (memory footprint: order of 300 Mb)
> > 
> > 300 MB is large enough that you should at least consider putting that
> > into a database (Sqlite is probably simplest. Personally I would go with
> > PostgreSQL because I'm most familiar with it and Sqlite is a bit of an
> > outlier).
> > 
> > The main reason for putting it into a database is the ability to use
> > indexes, so you don't have to scan all 100 k rows for each query.
> 
> I have an (inherited) server program that uses about 30 MB of data in a
> MySQL database. It services queries received over the network. It too had
> performance problems, to which adding indexes and smarter joins helped but
> not enough.
> 
> I changed the program so that at startup it imports much of the data into
> Python dictionaries that are structured to support the kinds of queries that
> need the help.  Response time to queries dropped dramatically.

This is to be expected: Firstly, because you don't have disk accesses
any more, secondly because you don't have network latency any more and
thirdly, because you structured the data to fit the queries.

The thing to keep in mind is that the relational database model was
invented to have a uniform and simple way to model all data, and that
RDBMSs are designed to handle all workloads (from a single tiny table to
thousands of tables with hundreds of terabytes) reasonably well. For any
given application you can always find a more efficient solution than
using an RDBMS. Sometimes it's simple (just load all the data into a
dict and serve from there), sometimes it's a major research project.
The nice thing about RDBMSs isn't that they are the optimal solution for
anything but that they are a "good enough" solution for a large class of
problems.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20230115/a18c896f/attachment.sig>


More information about the Python-list mailing list