Sampling and Probability Distribution

Decide if the sample is representative (or not) of the population for the question of interest.

Question: 1. What proportion of U.S. households viewed Superbowl XLIX
Sample: Nielsen Media Research randomly selected and contacted 5,108 U.S. households during the game.
Population: All U.S. households

A) Representative
B) Not representative

Ans (A)
The sample is representative of the population

2. A computer network manager wants to test the reliability of some new and expensive fiber-optic Ethernet cables that the computer department just received. The computer department received 7 boxes containing 50 cables each. The manager does not have the time to test all 350 cables. The manager will choose one box at random and test 10 cables chosen randomly within that box. What is the population?

A) The one box that was chosen at random from the 7 boxes.
B) The 6 boxes not chosen
C) 350 cables
D) The 10 cables chosen

Ans: (C)
The population is the entire 7 boxes containing 50 cables each that is all the 350 cables. The sample is the subset of the population that is the 10 test cables that will be chosen at random.

3. A concerned mother wants to know how much time her child’s classmates spend talking on the phone. She prepares a survey that asks about call times/duration. Which of the following is the best sample? [In other words, which sample has the least bias?]

A) A random sample of 10 students in her child’s class
B) A random sample of 10 of her child’s Facebook friends
C) The last 10 friends that her child has text messaged
D) A random sample of 10 classmates that her child has called in the past month

Ans (A)

4. A school district wants to study if high school start time and high school performance are related. The district randomly samples 10 students from each grade level: freshman, sophomore, junior, and senior to participate this study. What type of sampling method did the school district apply?

A) Simple Random Sample
B) Stratified Random Sampling
C) Cluster Sampling
D) Multistage Sample

Answer: (B)
A stratified sample is obtained by dividing the population into mutually exclusive groups, or strata, and randomly sampling from each of these groups; and groups selected by the school district are mutually exclusive as a senior can’t be a junior; same for all the groups, they are mutually exclusive.

5. A flight attendant needs to survey a random sample of the 350 passengers on an international flight from Detroit, MI to Tokyo-Narita, Japan. If the flight attendant randomly selected a seat position and surveyed all passengers sitting in that seat position, what type of sampling method did the flight attendant apply?

A) Simple Random Sample
B) Stratified Random Sampling
C) Cluster Sampling
D) Multistage Sample

Answer: (D)
Multistage sampling: In multistage sampling the entire population is divided into groups and then one group is randomly chosen and all people within that group are sampled.

6. Decide if the study below is an experiment or observational study.

A school district wants to study if high school start time and high school performance are related. The district sets five different start times, 7AM, 7:30AM, 8AM, 8:30AM, and 9:00AM and randomly assigns all students to one of those five times. The students' GPA are recorded at the end of one academic quarter.

A) Experiment
B) Observational Study

Answer (A)
This is a observational study because the investigators are observing the subjects (students) and measuring the variables of interest (GPA); another reason why it is an observational study is because the investigators do not have control over any other variable or condition that may affect the GPA of the student

7. A medical research team studied the effectiveness of herbal medicines in treating sleep disorders. The study used a sample of 1,000 patients with sleep disorders. The researchers randomly chose half of the participants to receive herbal medicine A and half of the participants to receive herbal medicine B. The study found that the majority of participants showed substantial improvement after taking the herbal medicine. Which of the following is true about this study?

A) The study lacks replication.
B) The study lacks randomization.
C) The study lacks a control group.
D) The study lacks experimental units.

Answer (C)

8. A research team studied the mortality rate of a sample of full-time employees. They wanted to find out the relationship between mortality rate and the length of time the subjects are not sitting (e.g., walking, standing) during the workday. In this study the researchers collected data concerning gender, age, and level of education. Which of the following is a confounding variable in this study?

A) Mortality rate
B) Time spent sitting during the workday
C) Gender
D) Length of time moving during the workday

Answer (C)
In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable.

9. A research team studied how variables such as age, gender, and ethnicity related to consumers’ choices within three brands of yogurt. In this study, what type of variable is brand of yogurt chosen?

A) Nominal categorical variable
B) Ordinal categorical variable
C) Discrete quantitative variable
D) Continuous quantitative variable

Answer (A)
A categorical variable (sometimes called a nominal variable) is one that has two or more categories, but there is no intrinsic ordering to the categories

10. You are interested in your classmates' favorite sports out of the following: football, baseball, soccer, and basketball. Which of the following graphical representations is most appropriate for use with this data?

A) Histogram
B) Pie Chart
C) Boxplot
D) Side-By-Side Boxplot

Answer (A)
As we are dealing with small and simple numbers histogram would be the most appropriate. Making a histogram would ensure that the information needed is easy to interpret.

11. Shown below is a stem-and-leaf plot for the heights of 10 students in inches. What is the median height of these students?

6| 2
6| 45567
7| 0122

A) 67 inches
B) 66 inches
C) 66.5 inches
D) 65 inches

Answer : [C] 66.5 inches.
The median refers to the middlemost value. For an even numbered data, there are two middle values when the data is arranged in increasing order from smallest to largest. Here the two middle heights are 66 and 67. Thus the mean of 66 and 67 (= 66.5) is the median height value.

12. Which of the following describes the distribution?

Sampling and Probability Distribution image 1

A) Normally distributed
B) Bimodal
C) Skewed to the left
D) Skewed to the right

Ans (D)
This distribution is skewed to the right because the mode is smaller than the mean.

13. Below are four different sets of data, each with n=5. Identify the data set with the highest standard deviation.

A) 1, 2, 3, 4, 5
B) 11.1, 12.1, 13.1, 14.1, 15.1
C) 1021.2, 1022.2, 1023.2, 1024.2, 1.025.2
D) 1, 2, 3, 4, 13

Ans (D)
Using the formula

Sampling and Probability Distribution image 2
Standard deviations of the series given are: (A) 1.78 (B) 1.414
(C) 1.414 (D) 4.317
Thus option (D) is correct

14. A financial company monitors the number of defects (incorrect amounts, incorrect account numbers, etc.) for every 1,000 money wires sent. The 5-number summary for these data is [1.4, 7.6, 12, 24, 94]. About 75% of the time, there are fewer than _____ defects per 1,000 money wires.

A) 24
B) 12
C) 20
D) 94

Ans (A)
The five number summary depicts the minimum, first quartile, median, third quartile, and maximum. Third quartile means that 75% data is considered; so to answer the question we will need the third quartile value, which is given as 24.
Thus 75% times there are fewer than 24 defects per 1000 money wires.

15. Assume that the empirical rule is a good model for the time it takes for all passengers to board an airplane. The mean time is 48 minutes with a standard deviation of 4 minutes. Which interval describes how long it takes for passengers to board the middle 95% of the time?

A) (40, 56)
B) (36, 60)
C) (44, 52)
D) (29, 37)

Ans (A)
Using formula μ=X∓z*(σ)
Values of X=48; z=1.96 (as level of confidence is 95%) ; σ=4
Thus we get answer as (40, 56)

16. Americans consume an average of 30.80 pounds of tomatoes per year with a standard deviation of 6.80 pounds per year. How many pounds of tomatoes per year does an American with a z score of +1.50 consume?

A) 41.00 pounds
B) 37.60 pounds
C) 32.30 pounds
D) 30.80 pounds

Ans (A)
Using formula of z- score: (X-μ)/σ
Putting values μ=30.8; σ=6.8 and z= 1.5
41 pounds come out to be the answer

17. Assume P(A) = .4 and P(B)=.5. If we are told that A and B are mutually exclusive, which of the following probabilities must be correct?

A) P(A and B) = 0
B) P(A or B) = 0
C) P(A and B) = .1
D) P(A or B)=.7

Ans (A)
Since A and B are mutually exclusive thus P(A∩B) is zero i.e. P(A and B) = 0

18. An employee is randomly selected from a large electronics company. Given that I = {the employee owns an iPad} and M = {the employee owns a MacBook}, what is the correct interpretation of P(I|M)?

A) The proportion of employees who own a MacBook who also own an iPad.
B) The proportion of employees who own an iPad who also own a MacBook.
C) The proportion of employees who own both an iPad and a MacBook.
D) The proportion of employees who own an iPad.

Ans (B) The proportion of employees who own an iPad who also own a MacBook

19. In one large office building, 80% of employees drink coffee and 15% of employees chew gum. Given that drinking coffee is independent from chewing gum, what percent of employees both drink coffee and chew gum?

A) 12%
B) 95%
C) 65%
D) 5.3%

Ans (A)
P(A|B) = P(A · B) / P(B) . However, when A and B are independent events, then P(A|B) = P(A)
As the sample is independent P(A∩B) is P(A)*P(B)

20. Assume that the grades received by last year’s STAT 200 students from two different professors' classes are shown in the following table:

GRADES
A B C D F
TOTAL

White


TEACHERS

Black


25 45 20 10 0


35 30 27 5 3

100


100

Total
60 75 47 15 3
200

What proportion of students from last year’s STAT 200 classes received a grade of B, given that they had Professor Black?

A) 30/100
B) 30/200
C) 175/200
D) 145/200

Ans (A)
We will find the conditional probability for students studying in Black’s class who scored a B.

21. It is thought that there may be a link between race and high blood pressure in males. Suppose we collect the following data:

High Blood Pressure
Yes

High Blood Pressure
No

Total

Male-Black

80

120

200

Male-White

50

150

200

What proportion of participants were white males with high blood pressure?

A) 50/400
B) 50/200
C) 330/400
D) 180/400

Ans (A)
Male-White with High Blood Pressure were 50, while the total number of participants was 400. Thus 50/400 proportion of participants were white males with high blood pressure.

22. The Pennsylvania Department of Transportation (PennDOT) wants to estimate the average speed of vehicles on Pennsylvania highways. To do so, PennDOT randomly samples 1,500 vehicles and records their speeds. They find that the average speed is 60 miles per hour. Which of the following best describes the measurement of the variable of interest?

A) Quantitative, continuous
B) Quantitative, discrete
C) Categorical, nominal
D) Categorical, ordinal

Ans (A)
Quantitative variables are the ones that can be measured on a numeric scale. Categorical variables represents values or labels such as gender (male/female), or a simple yes/no. A categorical variable is also known as a nominal variable. An ordinal variable is a categorical variable but with labels that follow an ordering (such as High, Medium, and Low).
Continuous variables are the ones that can be measured on a real line such that they can take any values a and b and even any value between a and b. A discrete value can take a value only a or b, not any value in between a and b.

23. Which of the following is an example of a binomial random variable?

A) X = Number of times a randomly selected student went to Pattee Library last week
B) X = Number of times a “head” is flipped in 20 flips of a coin
C) X = Number of rolls of a die until a “6” is rolled
D) X = Number of lottery tickets that must be purchased before winning twice

Ans (B)
For a variable to be a binomial random variable, ALL of the following conditions must be met:
1 There are a fixed number of trials (a fixed sample size)
2 On each trial, the event of interest either occurs or does not
3 The probability of occurrence (or not) is the same on each trial
Trials are independent of one another

24.

0 1 2 3 4 5 or more
p(X=x) 0.70 0.12 0.08 0.05 0.03 0.02

Based on past experience, a professor knows that the probability distribution for X = number of students who will attend an optional course review session is given above. What is the probability of at least 3 students attending such a session?

A) .05
B) .10
C) .90
D) .95

Ans (B)
Add probabilities of P(X=3), P(X=4), P(X=5or more) to come to the final probability of 3 or more students attending the classes

25. Sampling and Probability Distribution image 3

An auto parts manufacturer has determined that the probability of one of their spark plugs being defective is 0.10. A random sample of 4 spark plugs is selected from the most recent production run. The binomial probability distribution for X = number of defective spark plugs in a random sample of 4 spark plugs is given above. From the table above, what is the probability that 2 or fewer spark plugs are defective?

A) 0.2916
B) 0.0486
C) 0.9963
D) 0.0037

Ans (C)
Add probabilities P(X=0), P(X=1), P(X=2). to find probability of 2 or fewer sparks to be defective.

26. Suppose that X = Amount Won or Lost in a gambling game, the expected value is E(X) = - $0.71, what is the correct interpretation of this value?

A) The most likely outcome of a single play is winning 71 cents per play.
B) The most frequent value observed in a random gambling play is losing 71 cents per play
C) Over a large number of plays, the average outcome is losing 71 cents per play.
D) A player will have a loss of 71 cents every single time he or she plays this game.

Ans (C)

27. IQ scores are normally distributed with a mean equal to 100 and standard deviation equal to 15. What is the standardized score (z-score) for an IQ score of 85?

A) -1
B) 0
C) 1
D) 2

Ans (A)
Using formula of z- score : (X-μ)/σ
Putting values μ=100 σ=15 and X= 85
Then z score is equal to -1

28. Grace took a mid-term exam in her introductory statistics class and scored 86 points. Scores on the mid-term exam were normally distributed with a mean of 76 points and standard deviation of 16 points. What proportion of students scored lower than Grace?

A) .5000
B) .2660
C) .7340
D) .5320

Ans (C)
Using formula of z- score : (X-μ)/σ
Putting values μ=75; σ=16 and X=86
Z- score will come out to be 0.68
After finding the value of area to the left of z score = 0.62 in a-table it comes out to be 0.7340 (approx.)

29. Suppose that the scores on a mid-term exam are approximately normally distributed with a mean of 75 and standard deviation of 5. Using Standard Normal Table, find the proportion of scores between 75 and 90.

A) .9987
B) .5000
C) .4987
D) .013

Ans(C)
Using formula of z- score : (X-μ)/σ
Putting values μ=75; σ=5 and calculating the z score for X=75 and X=90
Z- score will come out to be 0 and 3, respectively.
The proportion between a z score of 0 and a z score of 3 according to the standard normal table comes out to be 0.9987- 0.5 = 0.4987

30. According to the U.S. Census, 11.7% of all Pennsylvanian males age 25 and over have never completed high school. Which of the following statements is true?

A) p = .117
B) 11.7% is a statistic
C) Based on the information, we cannot determine exactly what percentage of Pennsylvania males age 25 and over have never completed high school
D) If we were to take a sample of 1,000 Pennsylvanian males age 25 and over, we would necessarily find that 117 of them have never completed high school

Ans (A) Population proportion (p) will be 0.117

31. Parameter is to population as statistic is to

A) sample
B) estimate
C) mean
D) experimental unit

Ans (A)
A statistic is calculated for a sample

32. Which of the following is true concerning the t distribution?

A) It is the real sampling distribution for all sample proportions
B) It is an appropriate distribution to use for sample means when the population standard deviation, s, is unknown
C) The distribution is right skewed
D) As the sample size approaches zero (0), the t-distribution approaches a normal distribution

Ans (B)
t-test is used when population standard deviation is not known and the sample is small

33. A poll of 235 college students found that 80% of the respondents have never missed a class. What is the standard error of the sample proportion?

A) 0.200
B) 0.026
C) 0.160
D) 0.007

Answer (B)
The standard error of proportion is calculated as
SEp = sqrt [ p(1 - p) / n ] ,
where p is the proportion of respondents who have never missed a class.

Thus SEp = sqrt [ 0.80(1 – 0.80) / 235 ] = 0.026

34. According to an Internet report, 90% of Americans shop on Amazon.com. If we took a random sample of 2000 Americans, how many would we expect to shop on Amazon.com?

A) 1800
B) 1500
C) 550
D) 90

Ans (A)
As it is mentioned that 90% shop on amazon thus the random sample should ideally support the fact thus we will take 90% of 2000 which is 1800.

35. In a random sample of 49 college students, the average number of hours of study time in a week was found to be 11.6 with a standard deviation of 3.86. What is the standard error of the mean?

A) 49
B) 3.29
C) 0.551
D) 0.079

Ans (C)
Using the formula: Standard error= σ/(√N) Putting values σ=3.86 and N=49 On substituting answer comes out to be 0.551

36. The cost of college textbooks seems to grow every year. Let’s assume the costs are independent and normally distributed with a mean of $1,200 and standard deviation of $500. What is the expected value of the sample mean in a random sample of n=200 college students?

A) $1,200
B) $500
C) $2.4
D) $120

Ans (A)
For a normal distribution, the expected value of the sample mean is equal to the mean value of the population.

37. Life expectancy in U.S. is normally distributed with a mean of 78.4 years and a standard deviation of 16.51 years. What is the expected value of the sample mean in a random sample of n=50 people?

A) 78.4
B) 16.51
C) 4.75
D) 50.63

Answer (A)
For a normal distribution, the expected value of the sample mean is equal to the mean value of the population.

38. Given the average download time of 5-minute video through a media company using broadband is 95 seconds with a standard deviation of 30 seconds, if the company were to take a random sample of 200 clients, what is the probability that the sample mean download time will be greater than 100 seconds? [Hint: the standard error of the mean is 2.121]

A) 0.0092
B) 0.9207
C) 0.7642
D) 0.5160

Answer : (A)
The Z score is found to be 2.36 . Then from the standard normal table, for z = 2.36, the area under the curve between the mean (0) and the z value is 0.4909. Thus the probability of sample mean being greater than 100 seconds will be 0.50 – 0.4909 = 0.0092

39. Given that the average study time per week for a college student is 17 hours with a standard deviation of 7.5 hours, if we were to take a random sample of 150 college students, what is the probability that our sample will have a mean time less than 18 hours? [Hint: the standard error of the mean is 0.612]

A) 0.9489
B) 0.5517
C) 0.6772
D) 0.1227

Answer : (A)
The Z score is found to be 1.63. Then from the standard normal table, for z = 1.63, the area under the curve between the mean (0) and the z value is 0.4484. Thus the probability of sample mean being lesser than 18 hours will be 0.50 + 0.4484 = 0.948

40. A report claimed that during 2012-13 flu season 56.6% of children received a flu vaccine. If we were to take a random sample of 100 children, what is the probability that less than 55% of the sample got the flu shot during the 2012-13 flu season? [Hint: The standard error of the sample proportion is 0.0496]

A) 0.3735
B) 0.0110
C) 0.8962
D) 0.5438

Answer : (A)
The Z score is found to be -0.323. Then from the standard normal table, for z = -0.32, the area under the curve between the mean (0) and the z value is 0.1255. Thus the probability of sample mean being lesser than 55% is 0.50 - 0.1255 = 0.374