Fast pythonic way to process a huge integer list

Tim Chase python.list at tim.thechases.com
Wed Jan 6 22:21:41 EST 2016


On 2016-01-06 18:36, high5storage at gmail.com wrote:
> I have a list of 163.840 integers. What is a fast & pythonic way to
> process this list in 1,280 chunks of 128 integers?

That's a modest list, far from huge.

You have lots of options, but the following seems the most pythonic to
me:

  # I don't know how you populate your data so
  # create some junk data
  from random import randint
  data = [randint(0,1000) for _ in range(163840)]

  import itertools as i
  GROUP_SIZE = 128

  def do_something(grp, vals):
    for _, val in vals:
      # I don't know what you want to do with each
      # pair.  You can print them:

      # print("%s: %s" % (grp, val))

      # or write them to various chunked files:
      with open("chunk%04i.txt" % grp, "w") as f:
        f.write(str(val))
        f.write("\n")

  # but here's the core logic:

  def key_fn(x):
    # x is a tuple of (index, value)
    return x[0] // GROUP_SIZE

  # actually iterate over the grouped data
  # and do something with it:
  for grp, vals in i.groupby(enumerate(data), key_fn):
    do_something(grp, vals)

-tkc








More information about the Python-list mailing list