[Tutor] make a sqlite3 database from an ordinary text file

Tue Jun 21 05:07:37 EDT 2022

On 21/06/2022 07.18, Roel Schroeven wrote:
> 
> 
> Alan Gauld via Tutor schreef op 20/06/2022 om 20:01:
>> On 20/06/2022 09:45, Roel Schroeven wrote:
>>
>> >> NEVER initialised a DB/TBL this way. Easy for me to say: I learned
>> RDBMS
>> >> and SQL very early in my career, and therefore know to use the tools
>> >> from that eco-system. (the rationale in books is the exact opposite -
>> >> presuming the reader has never used DB-tools). Use the best tool
>> for the
>> >> job!
>> > I don't really agree with this. I quite often use Python to read
>> data > from some file, perhaps process it a bit, then store it in SQL.
>> Roel,
>>
>> We weren't saying not to use Python for processing SQL data.
>> It was specifically about creating the schema in Python
>> (and to a lesser degree loading it with initial data). You
>> can do it but it's usually much easier to do it directly
>> in SQL (which you have to embed in the Python code anyway!)
> 
> Ah, I misunderstood. My apologies.

There's plenty of room for that, just as there is plenty of room for
using Python, SQL-tools, or something else. (I wasn't offended)

The different SQL eco-systems, eg DB2, Oracle, MySQL, SQLite; offer very
different ranges of tools beyond the RDBMS 'engine' and command-line
client. Thus, it is all-very-well for me/us to say 'use the SQL-tools'
if we are talking about one system and for you/A.N.Other to question
such if using something less feature-full.

Then there are the points you have made. If there is quite a bit of
'data-cleaning' going-on, then such will start to exceed the capability
of the tools I/we've espoused. In which case, Python is the 'Swiss Army
Knife' to use - no debate.

There are plenty of rules and warnings about sanitising data before
INSERT-ing or UPDATE-ing. Anyone unaware of that should stop coding and
do some reading *immediately*! That someone would dare to use user-data
to formulate a schema is far to rich (risky) for my blood! I'm trying to
imagine a reasonable use-case, but...

Forced into some sort of situation like this, I'd split the task into
parts, with a manual inspection of the generated schema before I'd let
it run. (and once again, I'd input that schema through an SQL-tool - but
allow for the fact that my background enables me to achieve such
quickly. Whereas YMMV and thus you/A.N.Other might prefer to use the
Python interface)

Another consideration worth some thought is that key-value databases are
'schema-less'. The fact that in this scenario we don't know what the
schema will be until the first phase of the system 'designs it', becomes
irrelevant! The other nifty advantage to using something like MongoDB is
that its mechanisms are virtually indistinguishable from Python dicts
(etc) - so I'll often prototype with Python built-ins and only later
worry about persistence.

-- 
Regards,
=dn