Fast lookup of bulky "table"

Peter J. Holzer hjp-python at hjp.at
Mon Jan 16 14:25:22 EST 2023


On 2023-01-16 09:12:30 +1300, dn via Python-list wrote:
> On 16/01/2023 08.36, Weatherby,Gerard wrote:
> > I think any peformance improvements would have to come from a language change or better indexing of the data.
> Expanding on @Peter's post: databases (relational or not) are best organised
> according to use.

I probably wasn't very clear here. I think one of the main advantages of
relational databases over earlier models (especially hierarchical
databases but also network databases) is that you *don't* have to do
that.

In a hierarchical database you have to decide what's higher or lower in
the hierarchy (e.g., does a department have employees, or do employees
have a department?)  and that has a dramatic effect on performance so
you must structure the database on which queries are expected to be more
common.

In a relational database employees and departments are at the same level
and you have a relationship between them. You don't need to know whether
you'll have to look up all employees of a given department or the
department of a given employee more often. Both will be similarly fast
with appropriate indexes.

(Of course you should still have an idea what the data is used for when
designing your data model. But semantics are much more important than
use cases at this stage and you don't have to redesign your entire
database just because you need a new query.)

This flexibility comes with a cost, though: A relational database is
almost always good enough, but almost never optimal for any given use.


> Postgres and MySQL (for example) enable the establishment of multiple and
> sophisticated indices/indexes, and the aptly-named "views" of data.
> 
> If the queries can be grouped according to the manner in which the data must
> be accessed, a view could be built for each. At which time, even if every
> row must be accessed, the retrieval will be made more efficient and/or the
> response better-organised.

Nitpick: A view will not be more efficient (unless it's a materialized
view which is basically just a table). To retrieve that data, the RDBMS
has to perform exactly the same operations as for the underlying query.
Views make life simpler for the programmer (or analyst) by letting them
*write* simpler queries, but under the surface nothing changes. In fact
the performance may be worse, since the perceived simplicity of the view
may cause the human to write queries which are hard to optimize.

(There is another important use case for views: Access control and
enforcement of business rules. But I fear we are straying far from the
original topic now.)

> Thus, if we have a DB of people. Many retrievals are likely to utilise an
> index on 'name'. However, if at times interest is limited to place or
> suburb, an index and view of such will speed things from O(n). Similarly, if
> a query is only to return people with a dog license.

I don't see where a view would help here - but maybe you are thinking of
a more complex database than you describe.

> Some programmers don't realise that SQL can also be used for calculations,
> eg the eponymous COUNT(), which saves (CPU-time and coding-effort) over
> post-processing in Python.

Agreed. Aggregate and window functions can do a lot of work in the
database.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20230116/11e9244d/attachment.sig>


More information about the Python-list mailing list