Cleaning up conditionals

Peter Otten __peter__ at web.de
Sat Dec 31 21:16:00 EST 2016


Deborah Swanson wrote:

> Peter Otten wrote:
>> Deborah Swanson wrote:
>> 
>> > Here I have a real mess, in my opinion:
>> 
>> [corrected code:]
>> 
>> >         if len(l1[st]) == 0:
>> >             if len(l2[st]) > 0:
>> >                 l1[st] = l2[st]
>> >         elif len(l2[st]) == 0:
>> >             if len(l1[st]) > 0:
>> >                 l2[st] = l1[st]
>> 
>> > Anybody know or see an easier (more pythonic) way to do
>> this? I need
>> > to do it for four fields, and needless to say, that's a really long
>> > block of ugly code.
>> 
>> By "four fields", do you mean four values of st, or four
>> pairs of l1, l2, or
>> more elif-s with l3 and l4 -- or something else entirely?
>> 
>> Usually the most obvious way to avoid repetition is to write
>> a function, and
>> to make the best suggestion a bit more context is necessary.
>> 
> 
> I did write a function for this, and welcome any suggestions for
> improvement.
> 
> The context is comparing 2 adjacent rows of data (in a list of real
> estate listings sorted by their webpage titles and dates) with the
> assumption that if the webpage titles are the same, they're listings for
> the same property. This assumption is occasionally bad, but in far less
> than one per 1000 unique listings. I'd rather just hand edit the data in
> those cases so one webpage title is slightly different, than writing and
> executing all the code needed to find and handle these corner cases.
> Maybe that will be a future refinement, but right now I don't really
> need it.
> 
> Once two rows of listing data have been identified as different dates
> for the same property, there are 4 fields that will be identical for
> both rows. There can be up to 10 (or even more) listings identical
> except for the date, but typically I'm just adding a new one and want to
> copy the field data from its previous siblings, so the copying is just
> from the last listing to the new one.
> 
> Here's the function I have so far:
> 
> def comprows(l1,l2,st,ki,no):
>     ret = ''
>     labels = {st: 'st/co', ki: 'kind', no: 'notes'}
>     for v in (st,ki,no):
>         if len(l1[v]) == 0 and len(l2[v]) != 0:
>             l1[v] = l2[v]
>         elif len(l2[v]) == 0 and len(l1[v]) != 0:
>             l2[v] = l1[v]
>         elif l1[v] != l2[v]:
>             ret += ", " + labels[v] + " diff" if len(ret) > 0 else
>             labels[v] + " diff"
>     return ret
> 
> The 4th field is a special case and easily dispatched in one line of
> code before this function is called for the other 3.
> 
> l1 and l2 are the 2 adjacent rows of listing data, with st,ki,no holding
> codes for state/county, kind (of property) and notes. I want the
> checking and copying to go both ways because sometimes I'm backfilling
> old listings that I didn't pick up in my nightly copies on their given
> dates, but came across them later.
> 
> ret is returned to a field with details to look at when I save the list
> to csv and open it in Excel. The noted diffs will need to be reconciled.
> 
> I tried to use Jussi Piitulainen's suggestion to chain the conditionals,
> but just couldn't make it work for choosing list elements to assign to,
> although the approach is perfect if you're computing a value.
> 
> Hope this is enough context... ;)

At least the code into which I translate your description differs from the 
suggestions you have got so far. The main differences:

- Look at the whole group, not just two lines
- If there is more than one non-empty value in the group don't change any
  value.


from collections import defaultdict

def get_title(row):
    return row[...]

def complete(group, label):
    """For every row in the group set row[label] to a non-empty value
    if there is exactly one such value.

    Returns True if values can be set consistently.
    group is supposed to be a list of dicts.

    >>> def c(g):
    ...     gg = [{"whatever": value} for value in g]
    ...     if not complete(gg, "whatever"):
    ...         print("fixme", end=" ")
    ...     return [row["whatever"] for row in gg]
    >>> c(["", "a", ""])
    ['a', 'a', 'a']
    >>> c(["", "a", "a"])
    ['a', 'a', 'a']
    >>> c(["", "a", "b"])
    fixme ['', 'a', 'b']
    >>> c(["a"])
    ['a']
    >>> c([''])
    fixme ['']
    """
    values = {row[label] for row in group}
    has_empty = not min(values, key=len)
    if len(values) - has_empty != 1:
        # no value or multiple values; manual intervention needed
        return False
    elif has_empty:
        for row in group:
            row[label] = max(values, key=len)
    return True

if __name__ == "__main__":
    # read rows
    rows = ...

    # group rows by title
    groups = collections.defaultdict(list)
    for row in rows:
        groups[get_title(row)].append(row)

    LABELS = ['st/co', 'kind', 'notes']

    # add missing values
    for group in groups.values():
        for label in LABELS:
            complete(group, label)

    # write rows
    ...





More information about the Python-list mailing list