Eliminate "extra" variable

Tim Chase python.list at tim.thechases.com
Sun Dec 8 13:58:23 EST 2013


On 2013-12-07 23:14, Igor Korot wrote:
> def MyFunc(self, originalData):
>      self.dates = []
>       data = {}
>       dateStrs = []
>       for i in xrange(0, len(originalData)):
>             dateStr, freq, source = originalData[i]
>             data[str(dateStr)]  = {source: freq}
>             dateStrs.append(dateStr)
>      for i in xrange(0, len(dateStrs) - 1):
>            currDateStr = str(dateStrs[i])
>            nextDateStr = str(dateStrs[i + 1])
>            self.dates.append(currDateStr)
>            currDate = datetime.datetime.strptime(currDateStr,
> '%Y-%m-%d') nextDate = datetime.datetime.strptime(nextDateStr,
> '%Y-%m-%d') if nextDate - curDate < datetime.timedelta(days=31):
>                d = currDate + datetime.timedelta(days=1)
>                while d < nextDate:
>                    self.dates.append(d.strftime('%Y-%m-%d'))
>                    d = d + datetime.timedelta(days=1)
>      lastDateStr = dateStrs[-1]
>      self.dates.append(str(lastDateStr))
>      return data

It would help to know what you want this function to accomplish:
"MyFunc" isn't exactly descriptive.  From what I gather by reading
it, you want it to do two things:

 - append each date in the range from originalData[0] through
   originalData[-1] to self.dates every time this function is called
   (which means that multiple calls to this will grow self.dates
   every time)

 - if there's less than 31 days between N and N+1, also append all
   the dates in between (this seems weird, but okay).  Again, every
   time this function is called.

 - return a dictionary that maps dates in the input-data to the
   associated source:freq dictionary.

It's hard to tell what you intend to do with these results.  If you
just intend to iterate over them once, asking for associated data,
you could even create a generator that yields the date along with
either None or the associated data.  See below for that.
Alternatively, you can return both the dict-mapping and the date-list
from the function:

  def f(...):
    return (the_dict, the_list)

  a_dict, a_list = f(...)

That would prevent repeated mutation of self.dates

> As you can see there is many conversion going on and there's
> unneeded dateStrs which is used just to loop thru the dictionary
> and get the 2 consecutive keys.

The final snippet of code that I provided handles this pretty nicely
by zipping up the staggered lists and iterating over them while
unpacking them into sensible variable names.  Unless you have a need
to operate on the dates as string, I'd just keep them as dates
throughout the code and only turn them into strings upon output.

> But if you see a better way - please share.

I'd likely incorporate Peter's sliding_window() suggestion and do
something like the following (I commented out using a tuple for the
values, but using a tuple/namedtuple might make more sense)

    import itertools
    def sliding_window(i):
        a, b = itertools.tee(i)
        next(b)
        return itertools.izip(a, b)

    def some_descriptive_function_name(self, original_data):
        # construct these once-per-call rather than every loop
        # or even move them out to module-scope
        ONE_DAY = datetime.timedelta(days=1)
        MONTHISH = datetime.timedelta(days=31)
        for (
                (cur_dt, cur_freq, cur_source),
                (next_dt, next_freq, next_source)
                ) in sliding_window(original_data):
            info = {cur_source: cur_freq}
            # info = (cur_source, cur_freq)
            yield cur_dt, info
            if next_dt - cur_dt < MONTHISH:
                d = cur_dt + ONE_DAY
                while d < next_dt:
                    yield d, None
                    d += ONE_DAY
        info = {next_source: next_freq}
        # info = (next_source, next_freq)
        yield next_dt, info

which can then be used with

  for dt, info in self.some_descriptive_function_name(data):
    # dates should be returned in the same sequence
    # as your original logic
    if info is None:
      do_something_when_no_info(dt)
    else:
      do_something_with_dt_and_info(dt, info)

If you need to iterate over the data multiple times, you can just do

  tmp = list(self.some_descriptive_function_name(data))
  do_first(tmp)
  do_second(tmp)

-tkc





















More information about the Python-list mailing list