Type I Error, Type II Error, and Power

Objectives

Be able to define Type I error, Type II error, and power
Be able to identify Type I error and Type II error in a given scenario
Know what factors affect error rates and power
Be able to compute Type I error, Type II error, and/or power in a given hypothesis testing scenario

Errors in Hypothesis Testing

Recall from the previous notes that a hypothesis test involves two statistical hypotheses: the null hypothesis, H₀, and the alternative hypothesis, H_a.

When we conduct a hypothesis test, it leads us to one of two decisions based on the evidence of our sample: we either reject H₀ or fail to reject H₀. However, a hypothesis test is not foolproof—our decision might be incorrect!

A hypothesis test can result in one of two decision errors:

A Type I error occurs when we reject H₀ when H₀ is actually true. The probability of committing a Type I error is defined by the significance level, or α, and thus is usually denoted as α. The probability expression of a Type I error is:

A Type II error occurs when we fail to reject H₀ when H₀ is actually false. The probability of committing a Type II error is usually denoted as β. The probability expression of a Type II error is:

Conversely, a hypothesis test can result in one of two correct decisions as well:

We can reject H₀ when H₀ is actually false. The probability of rejecting H₀ when H₀ is false is called the power of a test and is computed as 1 – β. The probability expression of power is:

We can fail to reject H₀ when H₀ is actually true. The probability of failing to reject H₀ when H₀ is true is computed as 1 – α with a probability expression of:

The relationships amongst these correct decisions and decision errors can sometimes be best seen by a table:

H₀ True

H₀ False

Reject H₀

Fail to

Reject H₀

Truth

Decision from hypothesis test

Example

Most store-bought pregnancy tests produce a binary outcome: they indicate either that a woman is pregnant or indicate that they are not pregnant. Consider the following hypotheses:

H₀: a woman taking the test is pregnant

H_a: a woman taking the test is not pregnant

What would be a type I error in this context?

What would be a type II error in this context?

Example

In a Aesop’s fable “The Boy Who Cried Wolf,” a shepherd boy repeatedly runs to nearby village to claim that there was a wolf attacking his flock when in fact there was no wolf at all. Suppose the villagers were all statisticians. What type of error would they say the boy was committing when doing this? (Assume H₀: no wolf present)

What Influences α and β?

Notice that the probabilities for correct decisions and for decision errors can be defined in terms of the values of α and β. In general, there are three factors that can affect these values:

Significance level (α) – recall that α is the probability of rejecting H0 when H0 is true. Increasing α…

Sample size (n) – increasing n…

Effect size – the effect size is the difference between the true parameter size and the size specified in the null hypothesis.

effect size = true value – hypothesized value under H₀

Increasing effect size…

Calculating Type II Error and Power

Since Type I error (α) and (1 – α) rely on α, a value set by the researcher, their computations are simple. Type II error (β) and power (1 – β) require a little more work; thus, in these notes, we will focus on the computations for Type II error (β) and power.

Note: To calculate these probabilities, we need to know the true value of the parameter in the population. Since this is often not something that we actually know, most questions involving error and power calculations rely on us assuming a true value for the population parameter.

Recall from above that:

Type II error = β = P(fail to reject H₀|H₀ false)

Power = 1 – β = P(reject H₀|H₀ false).

So how can we go about calculating these values?

Recommended Steps for Calculating Power (1 – β) Step 1: Set up H₀ and H_a based on the scenario.

Step 2: Identify the critical value for the rejection region under H₀ (you can usually find this based on α, or sometimes this value is given to you directly).

Step 3: Draw the sampling distribution based on H₀.

Step 4: Draw the sampling distribution based on the true parameter value.

Step 5: Locate (and draw) the critical value in both the H₀ distribution and the true parameter distribution.

Step 6: To compute power (1 – β):

For a “less than” (left-tailed) test, find the area to the left of the critical value in the “true” distribution.

For a “greater than” (right-tailed) test, find the area to the right of the critical value in the “true” distribution.

For a “not equals to” (two-tailed) test, find the area to the left of the left critical value in the “true” distribution and the area to the right of the right critical value in the “true” distribution.

Note that if the question asks for the Type II error probability (β), you can follow the steps above to find power, and then just take 1 – power to obtain β.

Hints and Tips:

Drawing a picture is obviously not required, but as you’ll see when we do some examples, it really helps you visualize what you’re trying to calculate so that you’re less likely to make a mistake!

The critical value of the test needs to be stated in the form of your sample statistic (either a sample mean x̅ or a sample proportion p̂). If you’re not explicitly given it in this form, you might need to “convert” the critical value from the form of a value in the distribution of your test statistic (either a z-score or a t-score) or from the from of an α-level.

If α is known, you need to re-state your decision rule (the rule used to reject or fail to reject H₀) in terms of the appropriate sample statistic. Restate it in terms of x̅ if your hypotheses are about µ; restate it in terms of p̂ if your hypotheses are about p.

If α is not known, the decision rule will be given to you in terms of the appropriate sample statistic (i.e., it is given in terms of x̅ if your hypotheses are about µ, or is given in terms of p̂ if your hypotheses are about p).

Examples will really help with all of this! Let’s do a few.

Example

A drug company that manufactures a sleeping aid drug claims that more than 70% of the people that use their drug report an improvement in their sleep quality (compared to before they were taking the drug). Suppose a competing company wishes to test this claim by sampling 200 individuals who take the sleeping aid drug and asking them whether or not they experience an improvement in their sleep quality while on the drug.

State the null and alternative hypotheses.

If the test were to be carried out using α = 0.025, state the decision rule in terms of the sample proportion, p̂.

Suppose that in the sample of 200 individuals, 155 people claimed that they had an improvement in their sleep quality when taking the drug. Based on this, what can be said about the null hypothesis?

Using the test outlined above, what would be the probability of concluding that p = 0.70 if, in fact, p = 0.80? Find this probability by hand and using Minitab.

In Minitab…

Stat à Power and Sample Size à 1 Proportion…

Sample sizes: 300.

Comparison proportions: . Power values: .

Hypothesized proportion: .

Click Options…

Alternative hypotheses: ≠.0 Significance level: ≠.0.

What is the power of the test outlined above if, in fact, p = 0.80?

Example

Employees at multiple levels of a large company are stating that they are receiving, on average, fewer annual paid vacation days than the national average. Suppose it is known that the national average of annual paid vacation days in 2014 is known to be 7.5 days with a standard deviation of 1.6 days. To assess the validity of the employees’ claims, the company randomly samples 55 employees and finds that the average number of annual vacation days for this sample is 7.22 days.

State the null and alternative hypotheses.

Find the p-value of this hypothesis test by hand and using Minitab.

In Minitab: Graph à Probability Distribution Plot à View Probability
Distribution tab		Shaded Area tab
	Distribution: normal .	Define Shaded Area by: X Value
Mean: 0.		Select:
Std. Deviation:.1		X value:.

At α = 0.05, does the above test support the employees’ claims?

What type of error may have been made?

The employees are basing their claim on the national average of annual paid vacation days in 2014. Suppose, however, that this average has now decreased to

7.41 days in 2018. Find the power of the test outlined above by hand and using Minitab.

In Minitab…

Stat à Power and Sample Size à 1-Sample Z…

Sample sizes: 300.

Differences: .

Power values: .

Standard Deviation: .

Click Options…

Alternative hypotheses: ≠.0 Significance level: 9≠.0.

Example

Consider the following hypotheses about a proportion, p, in a certain population:

H₀: p = 0.50

Ha: p ≠ 0.50

Suppose the decision rule for a test of H₀ is given as:

“Reject H₀ if p̂ < 0.408 or if p̂ > 0.592.”

A sample of 80 individuals is taken to carry out the test outlined above. What is the probability of making a Type I error? Find this probability with the help of Minitab.

In Minitab: Graph à Probability Distribution Plot à View Probability
	Distribution tab	Shaded Area tab
	Distribution: normal .	Define Shaded Area by: X value
Mean: 0. Std. Deviation:.1		Select:
		X value:.

Type I Error, Type II Error, and Power –

If p = 0.44, what is the probability of concluding that p = 0.5 using the test outlined above (with a sample size of n = 80)? Find this probability by hand and using Minitab.

In Minitab…

Stat à Power and Sample Size à 1 Proportion…

Sample sizes: 300.

Comparison proportions: . Power values: .

Hypothesized proportion: .

Click Options…

Alternative hypotheses: ≠.0 Significance level: ≠.0.

Type I Error, Type II Error, and Power –

Example

The American Heart Association (AMA) recommends that adults should aim to get an average of at least 150 minutes of moderate exercise per week to maintain cardiovascular health. An employer at a moderately-sized company wonders if his employees are achieving this minimum. He takes a random sample of 40 employees and finds their average weekly minutes of moderate exercise to be 133 minutes. Suppose it is known that the standard deviation in the population is 57.9 minutes.

State the null and alternative hypotheses.

Suppose the test were carried out with a Type I error probability of 0.008. State the decision rule in terms of the standard normal distribution and in terms of x̅.

Find the power of the test outlined above if μ = 145, μ = 140, and μ = 130. Use Minitab to find the power values.

In Minitab…

Stat à Power and Sample Size à 1-Sample Z…

Sample sizes: 300. Differences: .

Power values: .

Standard Deviation: .

Click Options…

Alternative hypotheses: ≠.0 Significance level: 9.

Click Graph…

□ Display power curve ≠.