What to write or search on github to get the code for what is written below:

Dennis Lee Bieber wlfraed at ix.netcom.com
Sun Jan 16 11:51:42 EST 2022


On Sat, 15 Jan 2022 02:38:34 -0800 (PST), NArshad <narshad.380 at gmail.com>
declaimed the following:

	A bit of an improvement -- actual code...

	But... still not a minimal /runnable/ example...
>
>Why does the code written below is not giving any output?
>
>    xls = ExcelFile('ABC.xlsx')

	Where is "ExcelFile" defined? Best match I could find after spending
time with Google is the pandas package. Pandas is likely overkill for this
situation -- being optimized for analysis of numerical tabular data. In
particular, it expects each column to be of one data type, not mixed data
types, in order to apply aggregate functions (min, max, mean, std.dev.,
etc.) to the columns.

>    df = xls.parse(xls.sheet_names[0], index_col=1)

	Based upon documentation, pandas.ExcelFile.parse() internally uses the
openpyxl package for files of type .xlsx -- for this application openpyxl
is probably all that is needed.

	Documentation for .parse() (actually the underlying .read_excel()
operation) indicates that the desired sheet can be specified by name OR BY
POSITION -- with the default value being "0" [first sheet]. It is somewhat
perverse to be retrieving a list sheet names from the file, only to then
pass the first name back into the function which has to match the name up
against the list to determine the position... Unless the file is using
multiple sheets for data -- in which case you need some logic to select
something other than the first sheet -- you could leave that argument off
and let it default.

	index_col=1		says to use the SECOND COLUMN of the sheet to provide
row-labels for the "dataframe" (unless documentation states otherwise, the
first item of a collection is "0", not "1")

>    x=df.to_dict()

	What do you really expect from this conversion? With no arguments it
will produce a nested dictionary of the form

		{	"column1name"	:	{	"row1name"	: r1-c1-value	,
									"row2name" : r2-c1-value		},
			"column2name"	:	{	"row1name" : r1-c2-value,
									"row2name" : r2-c2-value		}	}

>    print (x)
>
>Only the contents of the first column and the column number is required in the dictionary

	You won't get that with .to_dict() -- it uses the column names for the
major grouping, and then uses the row names (which are already garbage as
you told pandas to use the second column for row names) to select the VALUE
stored in that row for that column. Also, while Python implementation of
dictionaries has changed over the years, the theoretical basis (associative
mapping, hash table) does not define an order for the keys of the
dictionary. Unless you are certain of how a dictionary handles
insert/delete of keys, you can not assume the first key out was the first
key that went in.

	It would also be helpful to have a snippet of the spreadsheet -- say
the first five or six rows, along with the first five or six columns;
exported as a CSV file (binary attachments get stripped from this forum,
but a CSV is text and can be pasted in -- 5x5 should be small enough that
those trying to assist can then extract it from the post, clean up any line
wrappings, and import into a blank spreadsheet.


-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
	wlfraed at ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/


More information about the Python-list mailing list