xlrd number of rows in worksheet (was: Re: Pre-defining an action to take when an expected error occurs)

Sun Sep 17 04:14:41 EDT 2006

On 16/09/2006 2:55 AM, Tempo wrote:
> John Machin  thanks for all of your help, and I take responsibility for
> the way I worded my sentences in my last reply to this topic. So in an
> effort to say sorry, I want to make it clear to everybody that it seems
> as though errors in my code and use of external programs (Excel in
> particular) are making "range(sh.nrows)" have faulty results. I am
> trying to pinpoint the spot in my code or use of Excel, before
> "range(sh.nrows) is executed, that is bugged. John Machin, I am
> thrilled that the package xlrd exists at all because it simplifies a
> daunting task for a beginner programer--me. Its uses are not bound to
> beginners either. So thanks for the package and your help to this point.
> 

I'm sorry, too: I should have wrapped my post in <humour> ... </humour> 
tags> :-)

Of course it's up to you to decide the criteria for filtering out 
accidental non-data from your spreadsheet. Note that this phenomenon is 
not restricted to spreadsheets; one often sees text data files with 
blank or empty lines on the end -- one's app just has to cope with that.

Here's an example of a function that will classify a bunch of cells for you:

def usefulness_of_cells(cells):
     """Score each cell:
     as 0 if empty,
     as 1 if zero-length text,
     as 2 if text and value.isspace() is true,
     otherwise as 3.
     Return the highest score found.
     """
     score = 0
     for cell in cells:
         if cell.ctype == xlrd.XL_CELL_EMPTY:
             continue
         if cell.ctype == xlrd.XL_CELL_TEXT:
             if not cell.value:
                 if not score:
                     score = 1
                 continue
             if cell.value.isspace():
                 score = 2
                 continue
         return 3
     return score

and here's an example of using it:

def number_of_good_rows(sheet):
     """Return 1 + the index of the last row with meaningful data in it."""
     for rowx in xrange(sheet.nrows - 1, -1, -1):
         score = usefulness_of_cells(sheet.row(rowx))
         if score == 3:
             return rowx+1
     return 0

A note on using the isspace() method: ensure that you use it on 
cell.value (which is Unicode), not on an 8-bit encoding (especially if 
your locale is set to the default ("C")).

| >>> '\xA0'.isspace()
| False
| >>> u'\xA0'.isspace()
| True
| >>> import unicodedata as ucd
| >>> ucd.name(u'\xA0')
| 'NO-BREAK SPACE'

You can get these in spreadsheets when folk paste in stuff off a web 
page that uses   as padding (because HTML trims out 
leading/trailing/multiple instances of SPACE). Puzzled the heck out of 
me the first time I encountered it until I did:
     print repr(data_that_the_users_were_shrieking_about)

Here's a tip: repr() in Python and "View > Page Source" in Firefox come 
in very handy when you have "what you see is not what you've got" problems.

Anyway, I'll add something like the above functions in an examples 
directory in the next release of xlrd (which is at alpha stage right 
now). I'll also add in a Q&A section in the docs, starting with "Why 
does xlrd report more rows than I see on the screen?" -- so do let us 
know what you find down the end of your spreadsheet, in case it's a 
strange beast that hasn't been seen before.

HTH,
John