The value the statistic based random sample size

SAMPLING AND LARGE SAMPLE TESTS

For the purpose of determining population characteristics, instead of enumerating the entire population, the individuals in the sample only are observed. Then the sample characteristics are utilized to approximately determine or estimate the population. For example, on examining the sample of a particular stuff we arrive at a decision of purchasing or rejecting that stuff. The error involved in such approximation is known as sampling error and is inherent and unavoidable in any and every sampling scheme. But sampling results in considerable gains, especially in time and cost not only in respect of making observations of characteristics but also in the subsequent handling of the data.

Sampling is quite often used in our day – to – day practical life. For example, in a shop we assess the quality of sugar, wheat or any other commodity by taking a handful of it from the bag and then decide to purchase it or not. A housewife normally tests the cooked products to find if they are properly cooked and contain the proper quantity of salt.

(ii) Random sampling

(iii) Stratified sampling

Purposive sampling is one in which the sample units are selected with definite purpose in view. For example, if we want to give the picture that the standard of living has increased in the city of New Delhi, we may take individuals in the sample from rich and posh localities like Defence Colony, South Extension, Golf Links, Jor Bagh, Chanakyapuri, Greater Kailash etc. and ignore the localities where low income group and the middle class families live. This sampling suffers from the drawback of favouritism and nepotism and does not give a representative sample of the population.

10.2.2 Random sampling:

Suppose we want to select ‘r’ candidates out of n. we assign the numbers one to n, one number to each candidate and write these numbers on n slips which are made as homogeneous as possible in shape, size, etc. these slips are then put in a bag and thoroughly shuffled and then ‘r’ slips are drawn one by one. The ‘r’ candidates corresponding to the numbers on the slips drawn will constitute the random sample.

Here the entire heterogeneous population is divided into a number of homogeneous groups, usually termed as strata, which differ from one another but each of these groups is homogenous within itself. Then units are sampled at random from each of this stratum, the stratum in the population. The sample, which is the aggregate of the sampled units of each of the stratum, is termed as stratified sample and the technique of drawing this sample is known as stratified sampling. Such a sample is by far the best and can safely be considered as representative of the population from which it has been drawn.

10.3.1 Sampling distribution of a statistic:

If we draw a sample of size n from a given finite population of size N, then the total number of possible samples is:

Sample number	t		s²
1	t₁		s₁²
2	t₂		s₂²
3	t₃		S₃²
	.	.	.
.	.	.	.
k	t_k		s_k²

10.4 TESTS OF SIGNIFICANCE:

Since, for large n, almost all the distributions, eg., Binomial, Poisson, Negative binomial, Hyper geometric, t, F, chi square can be approximated very closely by a normal probability curve, we use the normal test of significance for large samples. Some of the well known tests of significance for studying such differences for small samples are t-test, F-test and Fisher’s z –transformation.

10.5.1 Alternative Hypothesis:

The alternative hypothesis in (i) is known as a two tailed alternative and the alternative in (ii) and (iii) are known as right tailed and left-tailed alternatives respectively. The setting of alternative hypothesis is very important since it enables us to decide whether we have to use as single-tailed or two tailed test.

Type II Error: Accept H₀ when it is wrong, that is accept H₀ when H₁ is true.

If we write.

Thus P { Reject a lot when it is good} = α

and P { Accept a lot when it is bad } = β

P( t Є ω| H₀ ) = α,

P(t Є | H₁ ) = β

10.7.1 One tailed and two tailed tests:

In any test, the critical region is represented by a portion of the area under the probability curve of the sampling distribution of the test statistic.

Is a single tailed test. In the right test (H₁: μ > μ₀), the critical region lies entirely in the right tail of the sampling distribution or , while for the left tail test (H₁: μ < μ₀), the critical region is entirely in the left tail or the distribution.

A test of statistical hypothesis where the alternative hypothesis is two tailed such as:

The value of test statistic which separates the critical region and the acceptance region is called the critical value or significant value. It depends upon:

(i) The level of significance used, and

P(|Z|> Z_α) = α ------------------(1)

That is Z_αis the value so that the total area of the critical region on both tails is α. Since normal probability curve is a symmetrical curve, from (1), we get

That is the area of each tail is α/2. Thus Z_α is the value such that area to the right of Z_α is α/2 and to the left of - Z_α is α/2.

In case of single tail alternative, the critical value Z_α is determined so that total area to the right of it is α and for left tailed test the total area to the left of - Z_α is α.

1. Null hypothesis: set up the null hypothesis H₀.

2. Alternative hypothesis: set up the alternative hypothesis H₁. This will enable us to decide whether we have to use a single tailed test or two tailed test.

5. Conclusion: we compare z the computed value of z in step 4 with the significant value z_α, at the given level of significance, ‘α’.

If |Z| < z_α, that is if the calculated value of Z is less than z_α we say it is not significant. By this we mean that the difference t – E(t) is just due to fluctuations of sampling and the sample data do not provide us sufficient evidence against the null hypothesis which may therefore, be accepted.

If X ≈ n( μ, σ²), then z =

Thus from the normal probability tables, we have

P ( -1.96 ≤ Z ≤ 1.96 ) = 0.95 that is P(|Z| ≤ 1.96) = 0.95

è P(|Z| > 1.96 ) = 1 – 0.95 = 0.05

(i) Compute the test statistic Z under H₀.

(ii) If |Z| >3, H₀ is always rejected.

= 0.5 – 0.45

= 0.05

10.9 SAMPLING OF ATTRIBUTES:

E(X) = nP and V(X) = n PQ

Where Q = 1- P, is the probability of failure.

10.9.2 Test of significance for difference of proportions:

Suppose we want to compare two distinct populations with respect to the prevalence of a certain attribute, say A, among their members. Let X₁, X₂ be the number of persons possessing the given attributes respectively. Then sample proportions are given by

Since for large sample, p₁ and p₂are asymptotically normally distributed (p₁- p₂) is also normally distributed. Then the standard variable corresponding to the difference (p₁- p₂) is given by

10.10 SAMPLING OF VARIABLES:

In the case of sampling variables each member of the population provides the value of the variable and the aggregate of these values forms the frequency distribution of the population. From the population, a random sample of size n can be drawn by any of the sampling methods discussed before which is same as choosing n values of the given variables from the distribution.

Now E () =

Since x_i is a sample observation from the population X_i, (I = 1, 2, ……N) it can take any one of the values X₁, X₂,……………. X_N each with equal probability 1/N.

----------------------------(1)

We have V(x_i) = E[x_i-E(x_i)]² = E(x_i-μ]²

In particular,

-----------------------(2)

-------------(4)

Since , sample variance is not an unbiased estimate of population variance.

Therefore S² is an unbiased estimate of the population variance σ².

Let x₁, x₂,…… x_n be a random sample of size n from a population with variance σ², then the sample mean is given by

Standard error

Under the null hypothesis, H₀that the sample has been drawn from a population with mean μ and variance σ² that is there is no significant difference between the sample mean and population mean (μ), the test statistic is:

and

Also , being the difference of two independent normal variates is also a normal variate. The Z standard normal variable corresponding to is given by

The covariance term vanishes, since the sample mean and are independent.

Thus under H₀: μ₁ = μ₂, the test statistic becomes

for large samples.

But in case of large samples, the standard error(S.E.) of the difference of the sample standard deviations is given by