newbie - merging xls files using xldt and xlwt

Mon Dec 8 08:21:20 EST 2008

Hi,

I have been provided with the program below. Which sets out to merge
singlesheet excel files into one multisheet excel workbook.

Though it manages to merge multiple files into one workbook many of the
formats are not preserved. For example, what was originally 22.92% is
converted to 0.2292...

I was hoping if somebody could please help me understand how this can be
fixed. Ideally I would like to create a program which merges files whilst
keeping all formats (if at all possible).

#######################################
"""
Merge all xls files in a given directory into one multisheet xls file.
The sheets get the orginal file name, without the extension.
File names should not exceed 29 characters
"""
import xlrd, xlwt
import glob, os.path
def merge_xls (in_dir, out_file="K:/Stats Jobs 2003/St0883 (CITB -
J33882)/3. Results/Excel/merged_output.xls"):

    xls_files   = glob.glob(in_dir + "*.xls")
    sheet_names = [os.path.basename(v)[:-4] for v in xls_files]
    sheet_excl  = [os.path.basename(v)[:-4] for v in xls_files if
len(os.path.basename(v)[:-4]) > 29]
    merged_book = xlwt.Workbook()

    if in_dir[-1:] != "/": in_dir = in_dir + "/"
    xls_files.sort()
    if xls_files:
        for k, xls_file in enumerate(xls_files):
            print "---> Processing file %s" % (xls_file)
            if len (sheet_names[k]) <= 29:
                book = xlrd.open_workbook(xls_file)
                if book.nsheets == 1:
                    ws    = merged_book.add_sheet(sheet_names[k])
                    sheet = book.sheet_by_index(0)
                    for rx in range(sheet.nrows):
                        for cx in range(sheet.ncols):
                            ws.write(rx, cx, sheet.cell_value(rx, cx))
                elif book.nsheets in range(2, 100):
                    for sheetx in range(book.nsheets):
                        sheet0n = sheet_names[k]+str(sheetx+1).zfill(2)
                        ws      = merged_book.add_sheet(sheet0n)
                        sheet   = book.sheet_by_index(sheetx)
                        for rx in range(sheet.nrows):
                            for cx in range(sheet.ncols):
                                ws.write(rx, cx, sheet.cell_value(rx, cx))
                else:
                    print "ERROR *** File %s has %s sheets (maximum is 99)"
% (xls_file, book.nsheets)
                    raise
            else:
                print "WARNING *** File name too long: <%s.xls> (maximum is
29 chars) " % (sheet_names[k])
                print "WARNING *** File <%s.xls> was skipped." %
(sheet_names[k])
        merged_book.save(out_file)
        print
        print "---> Merged xls file written to %s using the following source
files: " % (out_file)
        for k, v in enumerate(sheet_names):
            if len(v) <= 29:
                print "\t", str(k+1).zfill(3), "%s.xls" % (v)
        print
        if sheet_excl:
            print "--> The following files were skipped because the file
name exceeds 29 characters: "
            for k, v in enumerate(sheet_excl):
                print "\t", str(k+1).zfill(3), v
    else:
        print "NOTE *** No xls files in %s. Nothing to do." % (in_dir)
merge_xls(in_dir="C:/temp/test merge xls/")
#######################################

Many thanks in advance.

Jigs

2008/11/3 J Sutar <jsutar at gmail.com>

> Mr Roskam,
>
> We have interacted before on the SPSS listserve, you have helped me out
> fair few times!
>
> I need a helping hand again, do you have the final solution to this
> problem, mergin xls from a given directory?
>
> Also where do I download the relevant modules from?
>
> Cheers
> Jigs
>
> 2008/10/15 Albert-jan Roskam <fomcl at yahoo.com>
>
> Hi John,
>>
>> Thanks! Using a higher xlrd version did the trick! Regarding your other
>> remarks:
>> -yep, input files with multiple sheets don't work yet. I kinda repressed
>> that ;-) Spss outputs only single-sheet xls files, but I agree it's nicer if
>> the programs works in other cases too.
>> -and no, I don't intend to use data fields. Wouldn't it be easier to
>> convert those to string values if I ever came across them?
>>
>> Thanks again!
>> Albert-Jan
>>
>>
>> --- On Wed, 10/15/08, John Machin <sjmachin at lexicon.net> wrote:
>>
>> > From: John Machin <sjmachin at lexicon.net>
>> > Subject: Re: newbie - merging xls files using xldt and xlwt
>> > To: python-list at python.org
>> > Date: Wednesday, October 15, 2008, 3:14 PM
>>  > On Oct 15, 9:16 pm, Albert-jan Roskam
>> > <fo... at yahoo.com> wrote:
>> > > Hi,
>> > >
>> > > I wrote the program below to merge all xls files in a
>> > given directory into one multisheet xls file. It uses xlwt
>> > and xlrd. The xls files I use for input are generated by
>> > Spss. When I open and re-save the files in Excel, the
>> > program works, but when I use the xls files as they were
>> > created by Spss, the program returns an error message (see
>> > below). Some details: Python 2.4, xlwt version 0.7.0, xlrd
>> > version 0.5.2, Win NT.
>> > >
>> > > Any ideas?
>> >
>> > The version of xlrd that you are using is an antique. Go to
>> > http://pypi.python.org/pypi/xlrd and get the latest
>> > version. If that
>> > doesn't work, send me a copy of one of the files that
>> > is displaying
>> > this problem.
>> > >
>> > > Thanks in advance!
>> > >
>> > > Albert-Jan
>> > >
>> > > """
>> > > Merge all xls files in a given directory into one
>> > multisheet xls file.
>> > > The sheets get the orginal file name, without the
>> > extension.
>> > > File names should not exceed 31 characters, as this is
>> > the maximum
>> > > sheet name length
>> > > """
>> > >
>> > > import xlrd, xlwt
>> > > import glob, os.path
>> > >
>> > > def merge_xls (in_dir,
>> > out_file="d:/merged_output.xls"):
>> > >
>> > >     xls_files   = glob.glob(in_dir +
>> > "*.xls")
>> > >     sheet_names = []
>> > >     merged_book = xlwt.Workbook()
>> > >
>> > >     [sheet_names.append(os.path.basename(v)[:-4])
>> > for k, v in enumerate(xls_files)]
>> >
>> > Wah! Try this:
>> > sheet_names = [os.path.basename(v)[:-4]) for v in
>> > xls_files]
>> >
>> >
>> > >     for k, xls_file in enumerate(xls_files):
>> > >         if len (sheet_names[k]) <= 31:
>> > >             book = xlrd.open_workbook(xls_file)
>> > >             ws =
>> > merged_book.add_sheet(sheet_names[k])
>> > >             for sheetx in range(book.nsheets):
>> > >                 sheet =
>> > book.sheet_by_index(sheetx)
>> > >                 for rx in range(sheet.nrows):
>> > >                     for cx in
>> > range(sheet.ncols):
>> > >                          ws.write(rx, cx,
>> > sheet.cell_value(rx, cx))
>> >
>> > I presume that you're not too worried about any date
>> > data.
>> >
>> > If an input file has more than 1 sheet, you are creating
>> > only one
>> > sheet in the output file, and overwriting cells.
>> >
>> > >         else:
>> > >[snip]
>> > >
>> > > merge_xls(in_dir="d:/temp/")
>> > >
>> > > *** WARNING: OLE2 inconsistency: SSCS size is 0 but
>> > SSAT size is non-zero ***
>> > > put_cell 0 1
>> > >
>> > > Traceback (most recent call last):
>> > [snip]
>> > >     self._cell_types[rowx][colx] = ctype
>> > > IndexError: list assignment index out of range
>> >
>> > Yeah, symptom of an xls file with a DIMENSIONS records that
>> > lies ...
>> > xlrd has like Excel become more resilient over time :-)
>> >
>> > BTW, consider joining the python-excel group at
>> > http://groups.google.com/group/python-excel
>> >
>> > Cheers,
>> > John
>> > --
>> > http://mail.python.org/mailman/listinfo/python-list
>>
>>
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081208/59a037bd/attachment-0001.html>