xlrd and cPickle.dump

John Machin sjmachin at lexicon.net
Tue Apr 1 17:53:51 EDT 2008


On Apr 2, 7:19 am, patrick.wa... at gmail.com wrote:
> > How many megabytes is "extremely large"? How many seconds does it take
> > to open it with xlrd.open_workbook?
>
> The document is 15mb ad 50,000+ rows (for test purposes I will use a
> smaller sample),

15 Mb is not large. E.g. 120 Mb is large.

> but my computer hangs (ie it takes a long time) when
> I try to do simple manipulations

What do you describe as "simple manipulations"? Please describe your
computer, including how much memory it has.

> and the documentation leads me to
> believe cPickle will be more efficient.

The doc says that *IF* you need to read the XLS file multiple times,
you can use xlrd.open_workbook() once and save a pickled copy. Then
you can c.Pickle.load() from the pickled file multiple times;
typically this is about 10 times as fast.

BUT this is just a faster way of getting an xlrd.Book object ... not
relevant to any subsequent "simple manipulations". You should not be
manipulating an xlrd.Book object; treat it as read-only.

>  If this is not true, then I
> don't have a problem (ie I just have to wait), but I still would like
> to figure out how to pickle an xlrd object anyways.
>
> > You only need one of the above imports at the best of times, and for
> > what you are attempting to do, you don't need pyExcelerator at all.
>
> Using pyExcelerator was a guess, because the traditional way didn't
> work and I thought it may be because it's an Excel file.

Don't guess. If the docs say it wants a file to write to, don't give
it None. Especially don't use a function/method call with silly side
effects.

>  Secondly, I
> import it twice because sometimes, and I don't know why, PythonWin
> does not import pyExcelerator the first time.  This has only been true
> with pyExcelerator.

This seems exceedingly unlikely to me. Try asking on the pywin32
mailing list.

>
> > > data_path = """C:\test.xls"""
>
> > It is extremely unlikely that you have a file whose basename begins with
> > a TAB ('\t') character. Please post the code that you actually ran.
>
> you're right, I had just quickly erased my documents and settings
> folder to make it smaller for an example.

I'm far too astonished by that (and your habit of putting any old
bunch of files in your root directory) to make any coherent comment.

>
>
> > Please post the minimal pyExcelerator-free script that demonstrates your
> > problem. Ensure that it includes the following line:
> >      import sys; print sys.version; print xlrd.__VERSION__
> > Also post the output and the traceback (in full).
>
> As to copy_reg.py, I downloaded Activestate Python 2.4 and that was
> it, so I have had no other version on my computer.
>
> Here's the code:
>
> import cPickle,xlrd, sys
>
> print sys.version
> print xlrd.__VERSION__
>
> data_path = """C:\\test\\test.xls"""
> pickle_path = """C:\\test\\pickle.pickle"""
>
> book = xlrd.open_workbook(data_path)
> Data_sheet = book.sheet_by_index(0)
>
> pickle_file = open(pickle_path, 'w')

Use 'wb' ...

> cPickle.dump(book, pickle_file)

... use cPickle.dump(book, pickle_file, -1) # latest protocol

*I WAS WRONG* (happens sometimes) when I said I couldn't reproduce
your problem with Python 2.4; my test/demo/timing code was using -1 by
default.

Note: as well as not complaining about contained file objects,
protocol -1 can be 10 times the speed of protocol 0 and produce a
pickle file half the size.

> pickle_file.close()
>
> Here's the output:
>
> 2.4.3 (#69, Apr 11 2006, 15:32:42) [MSC v.1310 32 bit (Intel)]
> 0.6.1
> Traceback (most recent call last):
>   File "C:\Python24\Lib\site-packages\pythonwin\pywin\framework
> \scriptutils.py", line 310, in RunScript
>     exec codeObject in __main__.__dict__
>   File "C:\text analysis\pickle_test2.py", line 13, in ?
>     cPickle.dump(book, pickle_file)
>   File "C:\Python24\lib\copy_reg.py", line 69, in _reduce_ex
>     raise TypeError, "can't pickle %s objects" % base.__name__
> TypeError: can't pickle module objects
>
> Thanks for the advice!

Here's some more: try to *understand* the "hanging" / "simple
manipulations" problem before you start implementing solutions.

Also, any good reason for sticking with Python 2.4?

HTH,
John



More information about the Python-list mailing list