Decimating Excel files

John Machin sjmachin at lexicon.net
Mon Feb 5 20:59:32 EST 2007


On Feb 6, 12:27 pm, "mensana... at aol.com" <mensana... at aol.com> wrote:
> On Feb 5, 5:46 pm, "Gabriel Genellina" <gagsl... at yahoo.com.ar> wrote:
>
>
>
> > En Sat, 03 Feb 2007 18:52:10 -0300, mensana... at aol.com
> > <mensana... at aol.com> escribió:
>
> > > On Feb 3, 1:43?pm, gonzlobo <gonzl... at gmail.com> wrote:
> > >> We have a data acquisition program that saves its output to Excel's
> > >> .xls format. Unfortunately, the programmer was too stupid to write
> > >> files the average user can read.
>
> > >> I'd like some advice on how to go about:
> > >> 1. Reading a large Excel file and chop it into many Excel files (with
> > >> only 65535 lines per file)
>
> > > An Excel sheet only has 65535 lines. Or do yo mean it has
> > > multiple sheets?
>
> > As I understand the problem, the OP has a program that generates the .xls
> > files, but it's so dumb that writes files too large for Excel to read.
>
> My first thought was how would that be possible?
>
> But then, nothing's stopping someone from making
> a million line .csv file (which Excel thinks it "owns")
> that would be too big for Excel to open.
>
> If that's the case, then chasing COM is barking up
> the wrong tree.
>

To clear up the doubts, I'd suggest that the OP do something like this
at the Python interactive prompt:

print repr(open('nasty_file.xls', 'rb').read(512))

If that produces recognisable stuff, then it's a CSV file (or a tab
separated file) masquerading as an XLS file.

OTOH, if it produces a bunch of hex starting with "\xd0\xcf
\x11\xe0\xa1\xb1\x1a\xe1" then it's at least an OLE2 compound document
-- could be Word or Powerpoint, though :-)

What would be even better is, if the OP has downloaded xlrd:

Presuming Windows, and Python installed in default location, and xlrd
installed using its setup.py, do this at the command prompt

c:\python25\python c:\python25\scripts\runxlrd.py ov nasty_file.xls

This should give an overview ("ov") of the file, showing for each
worksheet how many columns and rows are used, plus other potentially
helpful information -- or it will raise an exception; still useful
information.

Cheers,
John





More information about the Python-list mailing list