STATS 121 FINAL
alternative hypothesis : A statement about the value of a parameter that is either "less than," "greater than," or "not equal to" a hypothesized number or another parameter; the hypothesis that the researcher usually wants to prove or verify
ANOVA (Analysis of Variance) : A procedure used to test equality of three or more means.
bivariate data : two measurements are made on each unit
block : A group of experimental units sharing some common characteristic. In a randomized block design, random allocation of treatments is carried out separately within each group.
center : A summary number about which observations tend to cluster.
Measures of center : mean and median
claimed parameter value : the value of the parameter as given in the null hypothesis
completely randomized design (CRD) : an experimental design where all experimental units are assigned at random to treatments
confidence interval : An estimate of the value of a parameter in interval form with an associated level of confidence; it gives a list of plausible values for the parameter based on the value of the statistic
confidence level : The percentage of all possible samples for which the confidence intervals will contain the parameter being estimated; selected subjectively by the researcher
convenience sample : A sample type where the researcher contacts those subjects who are readily available and does not use any random selection. The results are almost always biased.
correlation coefficient : A measure of the strength of the linear relationship between two quantitative variables, symbolized with the letter r
direction of relationship : A characteristic of data in a scatterplot that is identified as either a positive or negative association
distribution : A list of all possible values of a variable together with the frequency (or probability) of each value
expected count : An estimate of how many observations should be in a cell of a two way table if there were no association between the row and column variables
experiment : A study where treatments are deliberately imposed on the individuals in the study before data is gathered in order to observe their response to the treatment.
fail to reject Ho : The appropriate statistical conclusion in hypothesis testing when the P-value is greater than α; equivalently, conclude that "There is not enough evidence to believe Ha"
failure : Any category that is not of primary interest in a categorical data set
F test statistic : A test statistic that has an F distribution
histogram : A graphical display of a quantitative data set; data are grouped into intervals (usually of equal width) and a bar is drawn over each interval having height proportional to the frequency (or percentage) of values in the interval. Values of the variable are given on the x axis and frequencies (or percentages) are given on the y axis. Examined to determine shape, center, and spread.
influential point : An observation that substantially alters the fitted regression equation
interquartile range (IQR) : The difference between Q3 and Q1 (Q3-Q1); the length of the box in a boxplot; contains 50% of the data.
left skewed : A density curve where the left side of the distribution extends in a long tail (mean<median)
left-tailed alternative hypothesis : An alternative hypothesis that states the parameter value is less than some number or the parameter from another treatment or population (e.g. Ha: μ<85)
marginal distribution : The distribution of only one variable in a two way table using counts found by summing over the categories of the other variable
marginal percentage : The percentage for a row (or column) total in a two table found by dividing the row (or column) total by the table total
measurement : A recorded fact about an individual; may be either numerical (quantitative) or qualitative (categorical)
measurement bias : Bias introduced into survey results because of poorly worded questions, interviewer effects, measuring instrument difficulties, etc.
multi-stage sample : A type of sample from a population that has groups and sub-groups. First, some groups are randomly selected, and then some sub-groups from within the selected groups are randomly sampled. Finally, individuals are randomly selected from within the sampled sub-groups. This can be extended to sub-sub-groups, etc.
natural variation : Variation from object to object within a population.
null hypothesis : The hypothesis that the researcher assumes to be true until sample results indicate otherwise; usually the hypothesis that the researcher wants to disprove.
observational study : A study that merely observes conditions of individuals in a population and records information; the population is disturbed as little as possible. (Note: treatments are not imposed on units.)
one sample z (procedure for proportion) : An inferential procedure using the proportion from one sample to test or estimate the population proportion; the approximate distribution of the test statistic is z or standard normal.
one-sided (or one-tailed) test : An alternative hypothesis where the researcher is interested in deviations in only one direction (< or > is in Ha)
parameter : A characteristic of a population that is usually unknown; this characteristic could be the mean (μ), median, proportion, standard deviation (σ) computed on all the data from the population. Does not have variability
placebo : A fake imitation treatment that resembles the real treatment in all respects except for the active ingredient
population distribution : The distribution of all the observations in a population
population mean (μ) : Mean of all the observations in a population
practical significance : A difference between the observed statistic and the claimed parameter value that is large enough to be worth reporting. To assess, look at the numerator of the test statistic and ask 'Is the difference important?' If yes, then results are also this. Note: Do not assess this unless results are statistically significant.
predicted y (symbolized by ŷ) : Value for y at a specified x as predicted by the regression equation; computed by plugging the value for x into the equation and solving for y.
proportion : the fraction of successes in either a sample (p hat) or a population (p)
P-value : the probability of getting a test statistic as extreme as or more extreme than the value actually observed assuming Ho is true
quartile : one of the three values that divide the ordered data set into quarters
question wording bias : sample results that differ from the truth because of the wording of the question used to obtain the information
randomized block design (RBD) : An experimental design where treatments are randomly allocated within each block.
random number table : A table consisting of the digits 0 through 9 in equal proportions such that the digit in any position in the table is independent of the digits in neighboring positions (i.e., there is no pattern in the order of the digits.)
reject H0 : The appropriate statistical conclusion when the P-value is less than or equal to α; conclude that "There is enough evidence to believe Ha."
replication : having more than one individual per treatment in an experiment (Note: replication is NOT same as reproducibility of results or repetition of an experiment)
response variable : the variable that will be measured in an experiment; also called the dependent variable
right skewed distribution : A density curve where the right side of the distribution extends in a long tail; (mean > median)
sample standard deviation (s) : a measure of the variability in a sample about x̅
sample variance (s^2) : The average of the squared deviations of the observations in a sample about the mean, x̅
selection bias : bias introduced into sample results due to how the units were selected for sampling
shape : description of the overall pattern of a histogram using terms including symmetric, right skewed, left skewed, flat (uniform), bell-shaped, etc.
slope (β=parameter symbol, b=statistic symbol) : A measure of the average rate of change in the response variable for every one unit increase in the explanatory or independent variable
spread : A summary number representing the variability of the observations. Measures include range, interquartile range, and standard deviation.
standard error of p̂ (or standard deviation of the sampling distribution of p̂) : an estimate of the standard deviation of the sampling distribution of p̂; equals √[p̂(1-p̂) ÷ n]
standard error of x̅ (or standard error of the mean) : an estimate of the standard deviation (variability) of the sampling distribution of x̅; estimates variability of all the x̅ 's about μ; equals s/√n
statistics : the study of data analysis-collecting data, organizing and summarizing data, and drawing conclusions from sample data to answer research questions in the presence of variation.
statistic inference : drawing conclusions about a known parameter from a statistic
strength of linear relationship : an assessment of how closely clustered points are about a straight-line in a scatterplot. Very little scatter signifies a strong relationship, lots of scatter signifies a weak relationship. Measure using r, the correlation coefficient.
subject : an individual or unit in a study, usually a person.
t distribution : A distribution specified by degrees of freedom used to model test statistics for the sample mean, differences between sample means, etc. where σ (' s) is (are) unknown.
test of independence : a chi-square test on data collected from a single SRS with two categorical measurements on each individual. The null hypothesis states that there is no relationship between the two categorical variables.
treatment : The condition or conditions applied to a subject or individual in an experiment; a placebo or no treatment is often considered a treatment. The collection of treatments is the explanatory variable.
t-test : Any test of significance where the test statistic can be modeled with the t-distribution; used when σ is unknown
type I error : the error made when rejecting a true null hypothesis; believing Ha when Ho is true
type II error : failing to reject a false null hypothesis; the error of believing Ho when Ha is true
upper-tailed alternative hypothesis : another name for right-tailed alternative hypothesis
variable : any characteristic of an individual or object; it may take on any number of values either categorical or numerical.
z-score : The number of std. deviations a value or observation is from the mean; a standardized x-value.
α : level of significance or probability of a type I error
b : estimated (sample) slope in a regression equation
C : Level of confidence
N (μ, σ) : Normal distribution with mean, μ, and standard deviation, σ
p : proportion (or percentage) of a population
r : sample correlation coefficient
s^2 : sample variance
σ^2 : variance of a population or distribution
σ/√n : standard deviation of the sampling distribution of x̅
y hat : predicted y


