Need max values in list of tuples, based on position

DFS nospam at dfs.com
Fri Nov 11 21:20:10 EST 2022


On 11/11/2022 7:04 PM, Dennis Lee Bieber wrote:
> On Fri, 11 Nov 2022 15:03:49 -0500, DFS <nospam at dfs.com> declaimed the
> following:
> 
> 
>> Thanks for looking at it.  I'm trying to determine the maximum length of
>> each column result in a SQL query.  Normally you can use the 3rd value
>> of the cursor.description object (see the DB-API spec), but apparently
>> not with my dbms (SQLite).  The 'display_size' column is None with
>> SQLite.  So I had to resort to another way.
> 
> 	Not really a surprise. SQLite doesn't really have column widths --

As I understand it, the cursor.description doesn't look at the column 
type - it goes by the data in the cursor.


> since any column can store data of any type; affinities just drive it into
> what may be the optimal storage for the column... That is, if a column is
> "INT", SQLite will attempt to convert whatever the data is into an integer
> -- but if the data is not representable as an integer, it will be stored as
> the next best form.

Yeah, I don't know why cursor.description doesn't work with SQLite; all 
their columns are basically varchars.


> 	123		=> stored as integer
> 	"123"	=> converted and stored as integer
> 	123.0	=> probably converted to integer
> 	123.5	=> likely stored as numeric/double
> 	"one two three"	=> can't convert, store it as a string
> 
> 	We've not seen the SQL query in question, 


The query is literally any SELECT, any type of data, including SELECT *. 
  The reason it works with SELECT * is the cursor.description against 
SQLite DOES give the column names:

select * from timezone;
print(cur.description)
(
('TIMEZONE',     None, None, None, None, None, None),
('TIMEZONEDESC', None, None, None, None, None, None),
('UTC_OFFSET',   None, None, None, None, None, None)
)

(I lined up the data)


Anyway, I got it working nicely, with the help of the solution I found 
online and posted here earlier:

-----------------------------------------------------------------
x = [(11,1,1),(1,41,2),(9,3,12)]
maxvals = [0]*len(x[0])
for e in x:

     #clp example using only ints
     maxvals = [max(w,int(c)) for w,c in zip(maxvals,e)]  #clp example

     #real world - get the length of the data string, even if all numeric
     maxvals = [max(w,len(str(c))) for w,c in zip(maxvals,e)]

print(maxvals)
[11,41,12]
-----------------------------------------------------------------

Applied to real data, the iterations might look like this:

[4, 40, 9]
[4, 40, 9]
[4, 40, 9]
[4, 40, 18]
[4, 40, 18]
[4, 40, 18]
[5, 40, 18]
[5, 40, 18]
[5, 40, 18]
[5, 69, 18]
[5, 69, 18]
[5, 69, 18]

The last row contains the max width of the data in each column.

Then I compare those datawidths to the column name widths, and take the 
wider of the two, so [5,69,18] might change to [8,69,18] if the column 
label is wider than the widest bit of data in the column

convert those final widths into a print format string, and everything 
fits well: Each column is perfectly sized and it all looks pleasing to 
the eye (and no external libs like tabulate used either).

https://imgur.com/UzO3Yhp


The 'downside' is you have to fully iterate the data twice: once to get 
the widths, then again to print it.

If I get a wild hair I might create a PostgreSQL clone of my db and see 
if the cursor.description works with it.  It would also have to iterate 
the data to determine that 'display_size' value.

https://peps.python.org/pep-0249/#cursor-attributes




 > but it might suffice to use a
 > second (first?) SQL query with aggregate (untested)
 >
 > 		max(length(colname))
 >
 > for each column in the main SQL query.


Might be a pain to code dynamically.




> """
> length(X)
> 
>      For a string value X, the length(X) function returns the number of
> characters (not bytes) in X prior to the first NUL character. Since SQLite
> strings do not normally contain NUL characters, the length(X) function will
> usually return the total number of characters in the string X. For a blob
> value X, length(X) returns the number of bytes in the blob. If X is NULL
> then length(X) is NULL. If X is numeric then length(X) returns the length
> of a string representation of X.
> """
> 
> 	Note the last sentence for numerics.



Thanks for looking at it.



More information about the Python-list mailing list