NaN comparisons - Call For Anecdotes

Chris Angelico rosuav at gmail.com
Thu Jul 17 11:36:24 EDT 2014


On Fri, Jul 18, 2014 at 1:12 AM, Johann Hibschman <jhibschman at gmail.com> wrote:
> Well, I just spotted this thread.  An easy example is, well, pretty much
> any case where SQL NULL would be useful.  Say I have lists of borrowers,
> the amount owed, and the amount they paid so far.
>
>     nan = float("nan")
>     borrowers = ["Alice", "Bob", "Clem", "Dan"]
>     amount_owed = [100.0, nan, 200.0, 300.0]
>     amount_paid = [100.0, nan, nan, 200.0]
>     who_paid_off = [b for (b, ao, ap) in
>                           zip(borrowers, amount_owed, amount_paid)
>                       if ao == ap]
>
> I want to just get Alice from that list, not Bob.  I don't know how much
> Bow owes or how much he's paid, so I certainly don't know that he's paid
> off his loan.
>

But you also don't know that he hasn't. NaN doesn't mean "unknown", it
means "Not a Number". You need a more sophisticated system that allows
for uncertainty in your data. I would advise using either None or a
dedicated singleton (something like `unknown = object()` would work,
or you could make a custom type with a more useful repr), and probably
checking for it explicitly. It's entirely possible that you do
virtually identical (or virtually converse) checks but with different
handling of unknowns - for instance, you might have one check for "who
should be sent a loan reminder letter" in which you leave out all
unknowns, and another check for "which accounts should be flagged for
human checking" in which you keep the unknowns (and maybe ignore every
loan <100.0). You have a special business case here (the need to
record information with a "maybe" state), and you need to cope with
it, which means dedicated logic and planning and design and code.

ChrisA



More information about the Python-list mailing list