2 sample chi-square test

Tue Dec 29 11:47:45 EST 2020

On Tue, 29 Dec 2020 02:52:15 -0800 (PST), Priya Singh wrote:
[snip]
> I have two spectra with wavelength, flux, and error on flux. I want to
> find out the variability of these two spectra based on the 2 sample
> Chi-square test.  I am using following code:
>
> def compute_chi2_var(file1,file2,zemi,vmin,vmax):
>     w1,f1,e1,c1,vel1 = get_spec_vel(dir_data+file1,zemi)
>     id1 = np.where(np.logical_and(vel1 >= vmin, vel1 < vmax))[0]   
>     w2,f2,e2,c2,vel2 = get_spec_vel(dir_data+file2,zemi)
>     id2 = np.where(np.logical_and(vel2 >= vmin, vel2 < vmax))[0]
>     f_int = interp1d(w1[id1], f1[id1]/c1[id1], kind='cubic')
>     e_int = interp1d(w1[id1], e1[id1]/c1[id1], kind='cubic')  
>     f_obs,e_obs  = f_int(w2[id2]), e_int(w2[id2]) 
>     f_exp, e_exp = f2[id2]/c2[id2], e2[id2]/c2[id2]
>     e_net = e_obs**2 + e_exp**2
>     chi_square =  np.sum( (f_obs**2 -  f_exp**2)/e_net  )
>     dof = len(f_obs) - 1
>     pval = 1 - stats.chi2.cdf( chi_square, dof)
>     print('%.10E' % pval)
>
> NN = 320
> compute_chi2_var(file7[NN],file14[NN],zemi[NN],vmin[NN],vmax[NN])
>
>
> I am running this code on many files, and I want to grab those pair of
> spectra where, the p-value of chi-squa is less than 10^(-8), for the
> change to be unlikely due to a random occurrence.
>
> Is my code right concept-wise? Because the chi-squ value is coming out
> to be very large (positive and negative), such that my p-value is
> always between 1 and 0 which I know from other's results not correct.
>
> Can anyone suggest me is the concept of 2-sample chi-squ applied by me
> is correct or not?

1. This is not really a Python question, is it?

2. Recommendation: test your chi-squared code on simpler sample data.

3. Observation: P-values *are* normally between 0 and 1.

4. Observation: chi-squared values are never negative.

5. Recommendation: Learn a little about the chi-squared distribution
   (but not on a Python newsgroup).  The chi-squared distribution with
   N degrees of freedom is the distribution expected for a quantity
   that is the sum of the squares of N normally distributed random
   variables with mean 0 and standard deviation 1.  If you expect
   f_obs to equal f_exp plus some normally distributed noise with
   mean 0 and standard deviation sigma, then (f_obs-f_exp)/sigma
   should be normally distributed with mean 0 and standard deviation 1.

6. Observation: (f_obs**2 -  f_exp**2)/e_net is probably not what
   you want, since it can be negative.  You probably want something
   like (f_obs-f_exp)**2/e_net.  But don't take my word for it.

Good luck.

-- 
To email me, substitute nowhere->runbox, invalid->com.