[scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function

Javier López jlopez at ende.cc
Sun Oct 8 06:40:15 EDT 2017


Why would the square of a real number ever be negative?

I believe the "quirk" in python is just operator precedence,
as the power gets evaluated before applying the unary "-"

On Sun, Oct 8, 2017 at 11:34 AM Joel Nothman <joel.nothman at gmail.com> wrote:

> (normalize(X) * normalize(X)).sum(axis=1) works fine here.
>
> But I was unaware of these quirks in Python's implementation of pow:
>
> Numpy seems to be consistent in returning nan when a negative float is
> raised to a non-integer (or equivalent float) power. By only calculating
> integer powers of negative floats, the absolute value is returned in
> suqareing. I assume this follows C conventions?
>
> Python, on the other hand, seems to do strange things:
>
> Numpy:
> >>> np.array(-.6) ** 2.1
> nan
> >>> np.array(-.6+0j) ** 2.1
> (0.32532987876940411+0.10570608538524294j)
>
> Python 3.6.2 returns the norm of the complex power:
> >>> -.6 ** 2.1
> -0.3420720779420435
> >>> (-.6 + 0j) ** 2.1
> (0.3253298787694041+0.10570608538524294j)
> >>> (((-.6 + 0j) ** 2.1).real ** 2 + ((-.6 + 0j) ** 2.1).imag ** 2) ** .5
> 0.3420720779420434
>
> Very strangely, putting the LHS in parentheses performs complex power in
> Python.
>
> >>> (-.6) ** 2.1
> (0.3253298787694041+0.10570608538524294j)
>
> At https://docs.python.org/3/reference/expressions.html:
>
> Raising a negative number to a fractional power results in a complex
> <https://docs.python.org/3/library/functions.html#complex> number. (In
> earlier versions it raised a ValueError
> <https://docs.python.org/3/library/exceptions.html#ValueError>.)
>
> By "in earlier versions" it means Python 2. I don't know why this should
> only be the case where the LHS is parenthesised. Seems like a CPython bug!
>
> On 8 October 2017 at 16:08, Christopher Pfeifer <
> chrispfeifer8557 at gmail.com> wrote:
>
>> I am attempting to validate the output of an L2 normalization function:
>>
>> *data_l2 = preprocessing.normalize(data, norm='l2') *        # raw data
>> is below at end of this email
>>
>> output:
>>
>> array([[ 0.57649683,  0.53806371,  0.61492995],
>>        [-0.53806371, -0.57649683, -0.61492995],
>>        [ 0.3359268 ,  0.90089461, -0.2748492 ],
>>        [ 0.6676851 , -0.39566524, -0.63059148],
>>        [-0.70710678,  0.        ,  0.70710678],
>>        [-0.63116874,  0.45083482,  0.63116874]])
>>
>>
>> Each row being a set of three features of an observation
>>
>>
>> I am under the belief that the sum of the 'squared' values of an instance (row) should be virtually equal to 1 (normalized).
>>
>>
>> *Problem - 1:*
>>
>> the np.square() function is returning the absolute value of the sum of the three features, even when the sum of the squares is clearly negative.
>>
>> np.square(-0.53806371) returns 0.28951255601896408    however, (-0.53806371**2)    returns    -0.2895125560189641
>>
>> The correct square of -0.53806371 is  -0.2895125560189641 (a negative number), even my 10 year old calculator gets it right.
>>
>> I can find nothing in the numpy documentation that indicates np.square() always returns the absolute value, instead of the correctly signed value.
>>
>> *Question:*
>>
>> Is there a way to force np.square() to return the correctly signed square value not the absolute value?
>>
>>
>> *Problem - 2:*
>>
>> For some of the observations (rows), the sum of the squared values (which should be virtually 1), are nowhere near 1.
>>
>>
>> print 0.57649683**2 + 0.53806371**2 +  0.61492995**2      row 1
>>
>> 0.9999999944260154  (this is virtually 1)
>>
>>
>> print -0.63116874**2 + 0.45083482**2  +  0.63116874**2    row 6
>>
>> 0.203252034924   (*this is nowhere near 1*)
>>
>>
>> sum of the 'squared' values of an instance (row) should be virtually equal to 1.
>>
>>
>> *Question:*
>>
>> Is the preprocessing.normalize(data, norm='l2') messing up, or is my raw data being fed into the normalization routine to unrealistic (I made it up of both positive and negative numbers.
>>
>>
>> *Raw Data*
>>
>> array([[ 1.5,  1.4,  1.6],
>>        [-1.4, -1.5, -1.6],
>>        [ 2.2,  5.9, -1.8],
>>        [ 5.4, -3.2, -5.1],
>>        [-1.4,  0. ,  1.4],
>>        [-1.4,  1. ,  1.4]])
>>
>> Thanks: Chris
>>
>>
>> P.S.: Not a real world problem, just trying to understand the functionality of scikit-learn. Have only been working with the package for two weeks.
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171008/03aa653c/attachment-0001.html>


More information about the scikit-learn mailing list