Eliminate "extra" variable

Sun Dec 8 09:04:13 EST 2013

Tim Chase wrote:

> On 2013-12-06 11:37, Igor Korot wrote:
>> def MyFunc(self, originalData):
>>      data = {}
>>      for i in xrange(0, len(originalData)):
>>            dateStr, freq, source = originalData[i]
>>            data[str(dateStr)]  = {source: freq}
> 
> this can be more cleanly/pythonically written as
> 
>   def my_func(self, original_data):
>     for date, freq, source in original_data
>       data[str(date)] = {source: freq}
> 
> or even just
> 
>     data = dict(
>       (str(date), {source: freq})
>       for date, freq, source in original_data
>       )

or even just

data = {str(date): {source: freq}
        for date, freq, source in original_data}

But do you really need a dict with a single key? And is it even correct? If 
a date occurs twice only the last source:freq pair is kept. Without knowing 
the context the humble

data = {}
for date, freq, source in original_data:
    source_to_freq = data.setdefault(date, {})
    if source in source_to_freq:
        raise ValueError(
            "Multiple frequencies for one source not supported")
    source_to_freq[source] = freq

appears so much more plausible...

> You're calling it a "dateStr", which suggests that it's already a
> string, so I'm not sure why you're str()'ing it.  So I'd either just
> call it "date", or skip the str(date) bit if it's already a string.
> That said, do you even need to convert it to a string (as
> datetime.date objects can be used as keys in dictionaries)?
> 
>>     for i in xrange(0, len(dateStrs) - 1):
>>           currDateStr = str(dateStrs[i])
>>           nextDateStrs = str(dateStrs[i + 1])
>> 
>> It seems very strange that I need the dateStrs list just for the
>> purpose of looping thru the dictionary keys.
>> Can I get rid of the "dateStrs" variable?
> 
> Your code isn't actually using the data-dict at this point.  If you
> were doing something with it, it might help to know what you want to
> do.
> 
> Well, you can iterate over the original data, zipping them together:
> 
>   for (cur, _, _), (next, _, _) in zip(
>       original_data[:-1],
>       original_data[1:]
>       ):
>     do_something(cur, next)

This reminds me that I am a proponent of small dumb helper functions ;)
I find

def sliding_window(items):
    a, b = itertools.tee(items)
    next(b, None)
    return zip(a, b)

dates = (date for date, _freq, _source in original_data)

for from_date, to_date in sliding_window(dates):
    do_something(from_date, to_date)

much more accessible. Plus, I can apply arbitrary improvements to the 
sliding_window() implementation or switch to a library version of that 
function without fear of messing things up.
Likewise, should original_data become a sequence of namedtuples it is 
straightforward to propagate this change with

dates = (item.date for item in original_data)

> If your purpose for the "data" dict is to merely look up stats from
> the next one, the whole batch of your original code can be replaced
> with:
> 
>   for (
>         (cur_dt, cur_freq, cur_source),
>         (next_dt, next_freq, next_source)
>         ) in zip(original_data[:-1], original_data[1:]):
>     # might need to do str(cur_dt) and str(next_dt) instead?
>     do_things_with(cur_dt, cur_freq, cur_source,
>       next_dt, next_freq, next_source)
> 
> That eliminates the dict *and* the extra variable name. :-)

Smileys are overused ;) Anyway, with namedtuples this ... would become

for cur_item, next_item in zip(original_data, original_data[1:]):
    do_things_with(cur_item, next_item)

Note that there's no need to slice the first argument as zip() ignores extra 
items.