Context manager for database connection

Wed Aug 23 19:28:48 EDT 2023

On 24/08/2023 06.11, dn via Python-list wrote:
> On 24/08/2023 03.41, Jason Friedman via Python-list wrote:
>> with Database() as mydb:
>> conn = mydb.get_connection()
>> cursor = conn.get_cursor()
>> cursor.execute("update table1 set x = 1 where y = 2")
>> cursor.close()
>> cursor = conn.get_cursor()
>> cursor.execute("update table2 set a = 1 where b = 2")
>> cursor.close()
>>
>>
>> import jaydebeapi as jdbc
>> class Database:
>>      database_connection = None
>>
>>      def __init__(self, auto_commit: bool = False):
>>          self.database_connection = jdbc.connect(...)
>>          self.database_connection.jconn.setAutoCommit(auto_commit)
>>
>>      def __enter__(self) -> jdbc.Connection:
>>          return self
>>
>>      def __exit__(self, exception_type: Optional[Type[BaseException]],
>>                   exception_value: Optional[BaseException],
>>                   traceback: Optional[types.TracebackType]) -> bool:
>>          if exception_type:
>>              self.database_connection.rollback()
>>          else:
>>              self.database_connection.commit()
>>          self.database_connection.close()
>>
>>      def get_connection(self) -> jdbc.Connection:
>>          return self.database_connection

Using a context-manager is a good idea: it ensures clean-up with/without 
an exception occurring. Accordingly, I (and may I presume, most) like 
the idea when working with life-cycle type resources, eg I/O. Absolutely 
nothing wrong with the idea!

However, the scope of a c-m is the with-block. If there are a number of 
'nested operations' to be performed (which could conceivably involve 
other c-ms, loops, or similar code-structures) the code could become 
harder to read and the length of the scope unwieldy.

An ease of management tactic is being able to see the start and end of a 
construct on the same page/screen. Such would 'limit' the length of a 
c-m's  scope.

Perhaps drawing an inappropriate parallel, but like a try-except block, 
there seems virtue in keeping a c-m's scope short, eg releasing 
resources such as a file opened for output, and some basic DBMS-es which 
don't offer multi-access.

Accordingly, why I stopped using a c-m for database work. NB YMMV!

There were two other reasons:
1 multiple databases
2 principles (and my shining virtue (?) )

1 came across a (statistics) situation where the client was using two 
DBMS-es. They'd used one for some time, but then preferred another. For 
their own reasons, they didn't migrate old data to the new DBMS. Thus, 
when performing certain analyses, the data-collection part of the script 
might have to utilise different DB 'sources'. In at least one case, 
comparing data through time, the routine needed to access both DBMS-es.
(oh what a tangled web we weave...)

2 another situation where the script may or may not actually have needed 
to access the DB. Odd? In which case, the 'expense' of the 'entry' and 
'exit' phases would never be rewarded.

Thus, 'inspired' to realise that had (long) been contravening SOLID's 
DSP advice?rule (Dependency Inversion Principle).

Accordingly, these days adopt something like the following (explaining 
it 'backwards' to aid comprehension in case you (gentle reader) have not 
come-across such thinking before - remember that word, "inversion"!)

- in the mainline, prior to processing, instantiate a database object

     database = Database( credentials )

- assume the mainline calls a function which is the substance of the script:

     def just_do_it( database_instance, other_args, ):
         while "there's work to be done":
             database.query( query, data, etc, )
             # could be SELECT or UPDATE in and amongst the 'work'

- a basic outline of query() might be:

     def query( self, sql, data, etc, ):
         cursor = self.get_cursor()
         cursor.execute( sql, data, ) # according to DB/connector, etc
         # return query results

- a query can't happen without a "cursor", so either use an existing 
cursor, or create a fresh one:

     def get_cursor( self ):
         if not self._cursor:
             connection = self.get_connection()
             self._cursor = connection.cursor()
         return self._cursor

NB assuming the DBMS has restrictions on cursors, I may have multiple 
connections with one cursor each, but in some situations it may be 
applicable to run multiple cursors through a single connection.

- a cursor can't exist without a "connection", so either ... :

     def get_connection( self ):
         if not self._connection:
             self._connection = # connect to the DB
         return self._connection

- to instantiate a DB-object in the first place, the class definition:

class Database:
     def __init__(self):
         self._connection = None
         self._cursor = None

- and the one part of the exposition that's not 'backwards':

     def close(self):
         if self._connection:
             self._connection.close()

It might be a plan to have several query() methods, according to 
application. If each is a dedicated query, such avoids the need to pass 
SQL around.

Alternately, and because having "SELECT ..." sprinkled throughout one's 
code is a 'code smell' ("magic constants"), it's a good idea to have all 
such collected into a separate module. This would also facilitate the 
corporate situation where a DBA will provide services to the 
applications team.
(don't EVER let a DBA into your Python code - you have been warned!)

Just as with all toy-examples, much detail has been omitted. 
Specifically the OP's concern for error-checking. The above enables more 
thorough checking and more precise error-reporting; because the steps 
are discrete (very SRP = Single Responsibility Principle). That said, 
from an application point-of-view, all the DB stuff has been centralised 
and it either works or the whole thing should probably be drawn to a 
grinding-halt.

The one glaring disadvantage is in the situation where a lot of 
(expensive) processing is carried-out, and only at the end is the DB 
accessed (presumably to persist the results). It could be frustrating to 
do 'all the work' and only thereafter find out that the DBMS is 
asleep-at-the-wheel, eg that Docker container has been started. Doh!

The OP had a linked-query 'commit or rollback' situation. The structure 
of a separate method for that query-pair (perhaps calling individual 
methods for each query) with attendant controls (as described 
previously) will work nicely.

Should you be foolish-enough to be required to (and capable of) cope 
with more than one DBMS, you can likely see that turning the Database 
class into an ABC, will enable consistency between multiple concrete and 
specific database class implementations. Thereafter a 'factory' to 
choose which DB-class to use, and all will be roses...
(yeah right!)

-- 
Regards,
=dn