11.
EXACT SAMPLING
DISTRIBUTIONS (CHI SQUARE DISTRIBUTION)
11.1
CHI-SQUARE
VARIATE:
The square of a
standard normal variate is known as chi-square variate with 1 degree of freedom
Thus if X≈
N(μ, σ2), then ![]()
And
is a chi-square variate with 1 degrees of freedom.
In general, if Xi,
(i = 1, 2, ……, n) are n independent normal variates with mean μi
and variance σi2, (i = 1, 2, ……, n), then
is
a chi-square variate with n degrees of freedom.
11.2
DERIVATION OF
THE CHI-SQUARE DISTRIBUTION:
First method –
method of moment generating function
If Xi,
(i = 1, 2, ……, n) are independent N(μi, σ i2),
we want the distribution of

Since Xi’s
are independent, Ui’s are also independent.
![]()
Since Ui’s
≈N(0,1) are identically distributed
Now
![]()


![]()

![]()
Which is the
moment generating function of a Gamma variate with parameters
and
.
Hence by
uniqueness theorem of moment generating function

Is a Gamma
variate with parameters
and
.


Which is the
required probability density function of chi square distribution with n degrees
of freedom.
11.3
MOMENT
GENERATING FUNCTION OF
DISTRIBUTION:
Let X ≈
. Then




![]()
Which is the
required moment generating function of a
variate with n degrees of freedom.
11.3.1
Cumulant
generating function of
distribution:
If X ≈
, then
![]()

k1 =
coefficient of t in K(t) = n
k2 =
coefficient of
in K(t) = 2n
k3=
coefficient of
in K(t) = 8n
k4 =
coefficient of
in K(t) = 48n
In general,
kr =
coefficient of
in K(t) = n2r-1(r-1)!
Hence
Mean = k1
= n
Variance =
μ2=k2=2n
μ3=k3=8n
μ4=k4+3
k22=48n + 12 n2
![]()
![]()
11.3.2
Limiting form of
distribution
for large degrees of freedom:
If X ≈
, then ![]()
The moment
generating function of standard
variate Z is given by
![]()
Or 



![]()
![]()
Where
are terms containing
and higher powers of n in the denominator.
![]()
Which is the
moment generating function of a standard normal variate. Hence by uniqueness
theorem of moment generating function of Z is asymptotically normal. In other
words, standard
variate tends to standard normal variate as n à∞. Thus,
distribution
tends to normal distribution for large degrees of freedom.
In practice for
n ≥ 30, the
approximation
to normal distribution is fairly good. So whenever n ≥ 30, we use the
normal probability tables for testing the significance of the value of
<. That is why in the tables given the significant
values of
have been tabulated till n = 30.
11.3.3
Characteristic
function of
distribution:
If X ≈
, then


![]()
11.3.4
Mode and
skewness of
distribution:
If X ≈
, then
------------------(1)
Mode of the
distribution is the solution of
and ![]()
Logarithmic
differentiation with respect to x in (1) gives:
![]()
Since
, ![]()
It can be easily
seen that at the point, x = (n-2),
.
Hence mode of
the chi square distribution with n degrees of freedom is (n-2).
Also Karl
Pearson’s correlation of skewness is given by
Skewness = (mean
– mode)/ standard deviation ![]()
Since Pearson’s
coefficient of skewness is greater than zero for n≥1, the
distribution is
positively skewed. Further since skewness is inversely proportional to the
square root of degrees of freedom, it rapidly tends to symmetry as the degrees
of freedom increases and consequently as n à ∞, the
chi square distribution tends to normal distribution.
11.3.5
Additive
property of
variate:
The sum of
independent chi square variates is also a
variate. More precisely, if Xi, (i =1, 2,
……..k) are independent
variates with ni
degrees of freedom respectively, then the sum
is also a chi square variate with
degrees of freedom.
Proof:
We have
![]()
The moment
generating function of the sum
is given by
![]()
![]()
![]()
Which is the
moment generating function of a
variate with (n1+n2+………+nk)
degrees of freedom. Hence by uniqueness theorem of moment generating function
is a <
variate with
degrees of freedom.
11.4
CHI-SQUARE
PROBABILITY CURVE:
We get from
11.3.4
-------------(*)
Since x > 0
and f(x) being the probability density function is always non-negative, we get
from (*)
if (n-2) ≤ 0,
For all values
of x. thus the
probability curve for 1 and 2 degrees of freedom is
monotonically decreasing. When n>2,
if
x < (n-2)
if x = (n-2)
if x > (n-2)
This implies
that for n>2, f(x) is monotonically increasing for 0 < x < (n-2) and
monotonically decreasing for (n-2) < x < ∞, while at x = n -2, it
attains the maximum value.
For n ≥ 1,
as x increases, f(x) decreases rapidly and finally tends to zero x à ∞. Thus n
>1, the
probability curve is positively skewed towards higher
values of x. moreover, x – axis is an asymptote to the cuve.
11.5
CONDITIONS FOR
THE VALIDITY OF CHI-SQUARE TEST:
Chi-square test
is an approximate test for large values of n. for the validity of chi square
test of ‘goodness of fit’ between theory and experiment, the following
conditions must be satisfied:
(i)
The
sample observations should be independent.
(ii)
Constraints
on the cell frequencies, if any , should be linear, eg., ∑ni =
∑λi or ∑Oi = ∑Ei.
(iii)
N,
the total frequency should be reasonably large, say, greater than 50.
(iv)
No
theoretical cell frequency should be less than 5. Distribution is essentially a
continuous distribution but it cannot maintain its character of continuity if
cell frequency is less than5, then for the application of chi square test, it
is pooled with the preceding or succeeding frequency so that the pooled
frequency is more than 5 and finally adjust for the degrees of freedom lost in
pooling.
11.6 LINEAR TRANSFORMATION:
Let us suppose
that the given set of variables
is transformed to a new set of variables
by means of the linear transformation:
![]()
![]()
.
.
.
![]()
That is ![]()
In matrix
notation, this system of linear equations can be expressed symbolically as
Y =AX
Where Y = 
From matrix
theory, we know that the system has a unique solution iff |A| ≠ 0. In other words, we can
express X uniquely in terms Y if A is non singular and the solution is given by
X= A-1
Y
Where A-1
is the inverse of the square matrix A.
The linear
transformation defined above is said to orthogonal if
![]()
![]()
![]()
A is an
orthogonal matrix.
More elaborately
![]()
---------(**)
For every set of
variables
.
If we write ![]()
Then (**)
implies that
is a kronecker delta so that
![]()
![]()
Hence it follows
that A is a orthogonal matrix.
11.7APPLICATIONS OF CHI-SQUARE DISTRIBUTION:
Chi square
distribution has a large number of applications in statistics, some of which
are enumerated below:
(i)
To
test if the hypothetical values of the population variance is σ2
= σ 02
(ii)
To
test the goodness of fit
(iii)
To
test the independence of attributes
(iv)
To
test the homogeneity of independent estimates of the population variance.
(v)
To
combine various probabilities obtained from independent experiments to give a
single test of significance.
(vi)
To
test the homogeneity of independent estimates of the population correlation
coefficient.
11.7.1 Chi-square test for population variance:
Suppose we want
to test if a random sample xi, (I = 1, 2, ., n) has been drawn from
a normal population with a specified variance σ2 = σ 02,
Under the null
hypothesis that the population variance is σ2 = σ 02,
the statistic

Follows
chi-square distribution with (n-1) degrees of freedom.
By comparing the
calculated value with the tabulated value of
for (n-1) degrees of freedom at certain level of significance,
we may retain or reject the null hypothesis.
11.7.2 Chi-square test of goodness of fit:
A very powerful
test for testing the significance of the discrepancy between theory and
experiment was given by Prof. Karl Pearson in 1900 and is known as “Chi-Square
test of goodness of fit”. It enables us to find if the deviation of the
experiment from theory is just by chance or is it really due to the inadequacy
of the theory to fit the observed data.
If Oi,
(i= 1, 2,…, n) is a set of observed frequencies and Ei, (i= 1, 2,…,
n) is the corresponding set of expected frequencies, then Karl Pearson’s chi-square, given by

![]()
Follows
chi-square distribution with (n-1) degrees of freedom.
11.8 YATES CORRECTION:
In a
contingency table, the number of degrees of freedom is
(2-1)(2-1) =1. If any one of the theoretical cell frequencies is less than 5,
then the use of pooling method for chi-square test results in chi-square with
zero degrees of freedom which is meaningless. In this case we apply a
correction due to F. Yates, which is usually known as “Yates Correction for
continuity”. This consists in adding 0.5 to the cell frequency which is less
than 5 and then adjusting for the remaining cell frequencies accordingly. The
chi-square test of goodness of fit is then applied without pooling method.
For a
contingency table,
|
a |
b |
|
c |
d |
We have
![]()
According to
Yate’s correction, as explained above, we subtract (1/2) from a and d and add
(1/2) to b and c so that the marginal totals are not disturbed at all. Thus,
corrected value of
is given as

Numerator = ![]()
![]()

11.9 BRANDT AND SNEDECOR FORMULA FOR 2Xk CONTIGENCY
TABLE:
Let the
observations aij, (i=1,2:j =1,2,……, k) be arranged in a 2 x k
contingency table as follows:
|
A |
A1 |
A2 |
………… |
Ai |
……………. |
Ak |
Total |
|
B1 |
a11 |
a12 |
………… |
a1i |
……………. |
a1k |
m1 |
|
B2 |
a21 |
a22 |
………… |
a2i |
……………. |
a2k |
m2 |
|
Total |
n1 |
n2 |
………… |
ni |
……………. |
nk |
N |
Under the
hypothesis of independence of attributes, we have
![]()



![]()
Where
![]()
And ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
But ![]()


11.10 BARTLETT’S TEST FOR HOMOGENEITY OF SEVERAL
INDEPENDENT ESTIMATES OF THE SAME POPULATION VARIANCE:
Let ![]()
Be the unbiased
estimate of the population variance, obtained from the ith sample Xij,(j=1,
2, …….ni)and based on vi = (ni – 1) degrees if
freedom, all the k samples being independent.
Under the null
hypothesis that the samples come from the same population with variance σ2,
that is the independent estimates
, (i =1,2,….k) of σ2are homogeneous,
Bartlet proved that the statistic

Where 
Follows
chi-square distribution with (k-1) degrees of freedom.
11.11 NON-CENTRAL CHI-SQUARE DISTRIBUTION:
The chi-square
distribution defined as the sum of the squares of independent standard normal
variates is often referred to as the central chi-square distribution. The
distribution of the sum of the squares of independent normal variates each
having unit variance but with possibly non zero means is known as non-central
chi-square distribution. Thus if Xi, (i=1,2,…,n)are independent
N(μi, 1), random variables then
![]()
Has the non
central chi-square distribution with n degrees of freedom. Intuitively, this
distribution would seem to depend upon the n parameters μ1,
μ2,…….., μn but it will be seen that it depends
on these parameters only through the non-centrality parameter.
![]()
And we write,
.
