Fast lookup of bulky "table"

Peter J. Holzer hjp-python at hjp.at
Sun Jan 15 06:14:24 EST 2023


On 2023-01-14 23:26:27 -0500, Dino wrote:
> Hello, I have built a PoC service in Python Flask for my work, and - now
> that the point is made - I need to make it a little more performant (to be
> honest, chances are that someone else will pick up from where I left off,
> and implement the same service from scratch in a different language (GoLang?
> .Net? Java?) but I am digressing).
> 
> Anyway, my Flask service initializes by loading a big "table" of 100k rows
> and 40 columns or so (memory footprint: order of 300 Mb)

300 MB is large enough that you should at least consider putting that
into a database (Sqlite is probably simplest. Personally I would go with
PostgreSQL because I'm most familiar with it and Sqlite is a bit of an
outlier).

The main reason for putting it into a database is the ability to use
indexes, so you don't have to scan all 100 k rows for each query.

You may be able to do that for your Python data structures, too: Can you
set up dicts which map to subsets you need often?

There are some specialized in-memory bitmap implementations which can be
used for filtering. I've used
[Judy bitmaps](https://judy.sourceforge.net/doc/Judy1_3x.htm) in the
past (mostly in Perl).
These days [Roaring Bitmaps](https://www.roaringbitmap.org/) is probably
the most popular. I see several packages on PyPI - but I haven't used
any of them yet, so no recommendation from me.

Numpy might also help. You will still have linear scans, but it is more
compact and many of the searches can probably be done in C and not in
Python.

> As you can imagine, this is not very performant in its current form, but
> performance was not the point of the PoC - at least initially.

For performanc optimization it is very important to actually measure
performance, and a good profiler helps very much in identifying hot
spots. Unfortunately until recently Python was a bit deficient in this
area, but [Scalene](https://pypi.org/project/scalene/) looks promising.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20230115/d934d0f6/attachment.sig>


More information about the Python-list mailing list