Fast lookup of bulky "table"

Weatherby,Gerard gweatherby at uchc.edu
Sun Jan 15 14:23:40 EST 2023


That’s about what I got using a Python dictionary on random data on a high memory machine.

https://github.com/Gerardwx/database_testing.git

It’s not obvious to me how to get it much faster than that.

From: Python-list <python-list-bounces+gweatherby=uchc.edu at python.org> on behalf of Dino <dino at no.spam.ar>
Date: Sunday, January 15, 2023 at 1:29 PM
To: python-list at python.org <python-list at python.org>
Subject: Re: Fast lookup of bulky "table"
*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

Thank you for your answer, Lars. Just a clarification: I am already
doing a rough measuring of my queries.

A fresh query without any caching: < 4s.

Cached full query: < 5 micro-s (i.e. 6 orders of magnitude faster)

Desired speed for my POC: 10 <ms

Also, I didn't want to ask a question with way too many "moving parts",
but when I talked about the "table", it's actually a 100k long list of
IDs. I can then use each ID to invoke an API that will return those 40
attributes. The API is fast, but still, I am bound to loop through the
whole thing to respond to the query, that's unless I pre-load the data
into something that allows faster access.

Also, as you correctly observed, "looking good with my colleagues" is a
nice-to-have feature at this point, not really an absolute requirement :)

Dino

On 1/15/2023 3:17 AM, Lars Liedtke wrote:
> Hey,
>
> before you start optimizing. I would suggest, that you measure response
> times and query times, data search times and so on. In order to save
> time, you have to know where you "loose" time.
>
> Does your service really have to load the whole table at once? Yes that
> might lead to quicker response times on requests, but databases are
> often very good with caching themselves, so that the first request might
> be slower than following requests, with similar parameters. Do you use a
> database, or are you reading from a file? Are you maybe looping through
> your whole dataset on every request? Instead of asking for the specific
> data?
>
> Before you start introducing a cache and its added complexity, do you
> really need that cache?
>
> You are talking about saving microseconds, that sounds a bit as if you
> might be “overdoing” it. How many requests will you have in the future?
> At least in which magnitude and how quick do they have to be? You write
> about 1-4 seconds on your laptop. But that does not really tell you that
> much, because most probably the service will run on a server. I am not
> saying that you should get a server or a cloud-instance to test against,
> but to talk with your architect about that.
>
> I totally understand your impulse to appear as good as can be, but you
> have to know where you really need to debug and optimize. It will not be
> advantageous for you, if you start to optimize for optimizing's sake.
> Additionally if you service is a PoC, optimizing now might be not the
> first thing you have to worry about, but about that you made everything
> as simple and readable as possible and that you do not spend too much
> time for just showing how it could work.
>
> But of course, I do not know the tasks given to you and the expectations
> you have to fulfil. All I am trying to say is to reconsider where you
> really could improve and how far you have to improve.
>
>
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!npizb3UAz-jPUnhlimB3_lctLibK5EW4zJwjZVmQ41yV_-2WSm2eQ5cTi8vzOEuCfsdNTjIvIhFcakrX$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!npizb3UAz-jPUnhlimB3_lctLibK5EW4zJwjZVmQ41yV_-2WSm2eQ5cTi8vzOEuCfsdNTjIvIhFcakrX$>


More information about the Python-list mailing list