Multithread and locking issue

Thu May 14 19:18:27 EDT 2020

On 14May2020 22:36, Stephane Tougard <stephane at sdf.org> wrote:
>A multithreaded software written in Python is connected with a Postgres
>database. To avoid concurrent access issue with the database, it starts
>a thread who receive all SQL request via queue.put and queue.get (it
>makes only insert, so no issue with the return of the SQL request).

Just a remark that PostgreSQL ets callers work in parallel, just make 
multiple connections. But if you're coming from just one connection, yes 
serialising the SQL queries is good.

>As long as it runs with 10 threads, no issues. At 100 threads, the
>software is blocked by what I think is a locking issue.

Or a timeout or busy loop. More print statements will help you.

>I guess Python multithreading and queue are working good enough that it
>can handle 100 threads with no issue (give me wrong here),

Yep, that is just fine.

>so I guess
>the problem is in my code.
>
>The function (thread) who handles SQL requests.
>
>def execute_sql(q):
>    print("Start SQL Thread")
>    while True:
>        try:
>            data = q.get(True,5)
>        except:

2 problems here:

1: a "bare except" catches _any_ exception instead of just the one you 
want (Empty). Don't do that - catch _only_ what you expect to handle.

2: since you're catching anything you never know why an exception 
occurred - at least report it:

    except Exception as e:
        print("No data: %s" % e)
        continue

>            print("No data")
>            continue

The first thing I would do is get rid of the timeout (5) parameter. And 
since block=True is the default, also that parameter:

    data = q.get()

This never times out, so there's no need to test for a timeout.

I would also put a print() _immediately_ before _and_ after the q.get() 
so that you know it was called and that it completed, and what it got.

To indicate no more SQL, send a sentinel such as None, and test for that:

    data = q.get()
    if data is None:
        break

>        print("RECEIVED SQL ORDER")
>        print(data)
>        print("END")
>        if data == "EXIT":
>            return

Ah, here's your sentinel. I prefer None or some other special value 
rather than a magic string ("EXIT"). Partly because some other queue you 
use might be processing arbitrary strings, so a string won't do.

>        try:
>            request = data['request']
>            arg = data['arg']
>            ref.execute(request,arg)
>        except:

Another bare except. Catch specific exceptions and report the exception!

>            print("Can not execute SQL request")
>            print(data)
>
>The code to send the SQL request.
>
>                sql = dict()
>                sql['request'] = "update b2_user set credit = credit -%s where id = %s"
>                sql['arg'] = (i,username,)
>                try:
>                    q.put(sql,True,5)
>                except:

Again the bare except. But just drop the timeout and go:

    q.put(sql)

and forget the try/except.

>                    print("Can not insert data")
>
>The launch of the SQL thread (nothing fancy here).
>
>q = qu.Queue()
>t = th.Thread(target = execute_sql, args = (q,))
>t.start()

That looks superficially good. Take out the timeouts, put in more 
print()s, and see what happens.

You also need to send the end-of-SQL sentinel:

    q.put(None)

or:

    q.put("EXIT")

depending on what you decide to use.

My problem with the timeouts is that they make your code far less 
predictable. If you get rid of them then your code must complete or 
deadlock, there's no fuzzy timeouts-may-occur middle ground. Timeouts 
are also difficult to choose correctly (if "correct" is even a term 
which is meaningful), and it is often then better to not try to choose 
them at all.

Cheers,
Cameron Simpson <cs at cskk.id.au>