[Tutor] Fwd: make a sqlite3 database from an ordinary text file

Mon Jun 20 18:18:15 EDT 2022

Roel,

I appreciate your explanations.

I wonder about one thing here. From your description, it sounds like the fully-text file being read needs to have the first line read in as text and remain as text. But in this case, all subsequent files should be sent to the database as integers or perhaps another numeric format. 

If that is correct, the generator used normally cannot return both of those easily. For this one case, you could open the darn file once to get the column names without using the generator, to get the column names as text, then reopen the file in the generator and throw away the first result as the generator will now return an integer form of the data  and the first line will be nonsense. Several variations are possible, and since a generator can keep state, it can be set to effectively skip the first line. 

The designs so far have not shown any idea of making integers, albeit a wrapper around the generator could perhaps do the conversion.

Some of this makes me appreciate all the hidden level of detail that read_csv() and similar functions deal with that may not be needed every time or may not be perfect, but can be relied on to deal with lots of detail you might not even think about.

-----Original Message-----
From: Tutor <tutor-bounces+avi.e.gross=gmail.com at python.org> On Behalf Of Roel Schroeven
Sent: Monday, June 20, 2022 4:38 AM
To: tutor at python.org
Subject: Re: [Tutor] Fwd: make a sqlite3 database from an ordinary text file

Op 20/06/2022 om 1:43 schreef avi.e.gross at gmail.com:
> I may not be looking at this right since much happens in functions not 
> seen but does cur.execute(many) expect two items of text or two items 
> of type integer or will it handle a list or tuple with arbitrary 
> numbers of contents and so on, is what makes the design decision.
cur.executemany() takes an iterable which on each iteration yields a number of items corresponding to the number of placeholders ("?") in the query. You can use any datatype that sqlite3 understands, but care should be taken to use the type you need. Other than other databases,
sqlite3 stores data as the type you give it, rather than the type specified in the database definition (in a sense sqlite3 is dynamically typed like Python, instead of statically typed like other SQL implementations).

So AFAICS the code is wrong on that point: it should convert the data to integers before handing them over to the database.

> Deeper within that function, it presumable maps your passed function 
> to code that may do,
>
> First, Second = f()
>
> And so on. Or it may just take a single result or other choices. It is 
> a bit confusing to pass a filename as someone pointed out.
This cur.executemany() call is the equivalent of:

     for time, nr_pieces in data_generator('abcd.txt'):
         cur.execute("insert into workdata values (?, ?)", (time,
nr_pieces))

But a bit shorter and with presumably less overhead.
> In what follows, either I am confused or the one asking the question is.
>
> The goal of the iterator seems to be to avoid reading all the data in 
> at once, right. Otherwise, why bother when you can do something like 
> reading in the entire file and splitting and putting the results in 
> something like a list structure, or perhaps in something from numpy.
>
> So why does the code use readline() (no 's') to read in a single line 
> and only once?
>
> As I read the somewhat confusing code, the goal is to open the file 
> the first time and read a line and throw the results away as it is not 
> clear where it is saved. Then the goal is to loop on nothing (or maybe 
> one line) and yield the results of splitting it and pause till called 
> again. But since only no or maybe one line have been received, the 
> call to iterate should just exit as the loop is done.
>
> So I think reading a single line should be moved down within the loop!
That readline() call is to skip the first line, because it contains column names instead of actual data. After that the code iterates over the file object which yields each remaining line on each iteration, which is then split and yielded for use by cur.executemany().
> And you need some kind of test to know when there are no remaining 
> lines in the file or the current line is empty and instead of yielding 
> a non-existent result, you should return what the iteration protocol 
> needs and close the file, albeit the with does that for you silently as part of that protocol.
No need, that's all taken care of automatically. "for line in obj" stops if there are no more lines, and as you say the with statement closes the file.
> But as others have noted, it is not clear other than as an exercise, 
> if this iterator is needed. Your other modules may supply something for you.
Why not use an iterator? Once you get the hang of it, using iterators is often just as easy as using lists; with the advantage that a program that uses iterators where it matters works for small and large datasets without any problem because it doesn't need large amounts of memory to store its data.

-- 
"Peace cannot be kept by force. It can only be achieved through understanding."
         -- Albert Einstein

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor