efficient way to get a sufficient set of identifying attributes

Thu Oct 19 12:05:57 EDT 2017

On 19/10/2017 16:42, Stefan Ram wrote:
> Robin Becker <robin at reportlab.com> writes:
>>                         Presumably the information in any attribute is highest
>> if the number of distinct occurrences is the the same as the list length and
>> pairs of attributes are more likely to be unique, but is there some proper way
>> to go about determining what tests to use?
> 
>    When there is a list
> 
> |>>> list = [ 'b', 'b', 'c', 'd', 'c', 'b' ]
> |>>> l = len( list )
> 
>    , the length of its set can be obtained:
> 
> |>>> s = len( set( list ))
> 
>    . The entries are unique if the length of the set is the
>    length of the list
> 
> |>>> l == s
> |False
> 
>    And the ratio between the length of the set and the length
>    of the list can be used to quantify the amount of repetiton.
> 
> |>>> s / l
> |0.5
.......
this sort of makes sense for single attributes, but ignores the possibility of 
combining the attributes to make the checks more discerning.
-- 
Robin Becker