extremely slow array indexing?

John Machin sjmachin at lexicon.net
Thu Nov 30 13:33:36 EST 2006


Will McGugan wrote:
> Grace Fang wrote:
>
> > Hi,
> >
> > I am writing code to sort the columns according to the sum of each
> > column. The dataset is huge (50k rows x 300k cols), so i need to read
> > line by line and do the summation to avoid the out-of-memory problem.
> > But I don't know why it runs very slow, and part of the code is as
> > follows. I suspect it's because of array index, but not sure. Can
> > anyone
> > point out what needs to be modified to make it run fast? thanks in
> > advance!
>
> Array indexing is unlikely to be the culprit. Could it not just be
> slow, because you are processing a lot of data? With numbers those big
> I would expect to have enough time to go make a coffee, then drink it.
>
> If you think it is slower than it could be, post more code for
> optimization advice...
>
> Will McGugan

Hi Grace,
What Will McGugan said, plus:
1. Post *much* more of your code e.g. all relevant parts :-)
2. Explain  "featureDict" and "componentdict1"; note that you seem to
be doing more dictionary accessing than array indexing.
3. Tell us what is "row" (not mentioned elsewhere) in the last line of
your code snippet. Should it be "currRow"? For your sake and ours,
copy/paste your code; don't re-type it.
4. Tell us what version of Python [why are you using dict.has_key??],
what platform, how much memory.
5. Tell us what "very slow" means e.g. how many rows per second.

HTH,
John




More information about the Python-list mailing list