PHP4006 Advanced Statistics Assignment

Section A (Answer ALL questions)

Answer each question as TRUE or FALSE in the answer book provided.

  • The null hypothesis of the Shapiro-Wilk test can be given as “The sample comes from a normally distributed population”.Outliers are influential cases.
  • SPSS has a non-parametric equivalent to a Factorial ANOVA for a completely randomized design.
  • If p = .01, you can conclude that there is a 1% chance the null hypothesis is true.
  • The odds of an event are the ratio of the probability of an event happening to the probability of the event not happening.
  • The dependent/outcome variable in a logistic regression is a binary variable.
  • In a Principal Component Analysis, the first component accounts for more of the variability in the data than the second component.
  • A MANOVA assumes that the variables in each group are normally distributed.
  • A loglinear model with two factors is equivalent to a contingency table analysis.
  • An ANCOVA allows there to be errors in the measured covariate.

Section B (Answer THREE questions)

  1. You are planning a study of the effectiveness of two therapies for treating Depression. Group A will receive the current therapy and Group B will receive a new and improved therapy. During the investigation all patients will be assessed three times over a total of 12 weeks treatment. You are trying to decide if the new therapy is more effective than the current therapy.
  2. a) State the name and design of the statistical test you intend to use?

(3 marks) b) State the null hypotheses of your test chosen in (a). (3 marks)

  1. Explain what the significance level of the chosen test is. (2 marks)
  2. State the assumptions that the chosen test requires to hold (6 marks)
  3. Describe how you would assess whether the assumptions hold.

(6 marks)

  1. f) State two other analyses you would consider using if the assumptions do not hold. (5 marks)
  1. Write short notes describing the type of data, question to be answered and your understanding of how each of the following statistical techniques works
  2. Factor Analysis (6 marks) ii. Path Analysis (6 marks) iii. Structural Equation Modelling (6 marks)

Then describe a dataset and justify the use of ONE of the above techniques. DO

NOT use a dataset used in the lectures. (7 marks)

  1. You have studied the impact of cognitive fatigue on behavioural impulsivity. You measured impulsivity using a decision-making task where a higher score represents less impulsivity. 60 participants were allocated to one of three conditions: a control condition and two conditions inducing cognitive fatigue (Mild or High). After the treatment, participants completed the decision-making task. The output for the One-way independent ANOVA is below (assumptions were met).
  1. Calculate the Omega-squared effect size for the differences in impulsivity scores across conditions from the information below. (9 marks)
  2. Calculate the Pearson’s r effect size for the statistically significant contrast result using the information below (7 marks)
  3. Explain some advantages of using effect sizes in addition to measures of statistical significance. (5 marks)
  4. Briefly summarise the main findings of the data below. (4 marks)



95% Confidence Interval for Mean

N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum

Control 20 100.80 8.817 1.972 96.67 104.93 79 114 Mild 20 85.05 11.009 2.462 79.90 90.20 65 100 High 20 81.10 6.601 1.476 78.01 84.19 64 96

Total 60 88.98 12.319 1.590 85.80 92.17 64 114



Sum of Squares


Mean Square



Between Groups






Within Groups







Contrast Coefficients

Contrast Fatigue Condition

Control Mild High


-2 1 1


0 -1 1

Contrast Tests


Value of Contrast

Std. Error



Sig. (2tailed)

Impulsivity Assume equal variances



-35.45 -3.95







Does not assume equal variances



-35.45 -3.95









  1. The data analysed below is on glucose control in diabetic patients. Good control is measured by a low value of Glucose in the blood



Glucose in the blood



Knowledge of the illness


Measure of attribution called fatalistic externalism


Duration of the illness in months


Length of schooling 0 – less than 13 years,

1 – more than 13 years

The output below is taken from the use of a forced entry regression and a forward regression to predict Glucose from Knowledge, Fatalism, Duration and Schooling.

  • How many diabetic patients were in the study

(1 marks)

  • Which predictor variable does the forced entry regression suggest is the best predictor of Glucose? Quote the p
    • marks)
  • Which predictor variable does the forward regression suggest is the best predictor of Glucose? Quote the p


  • Explain why the best predictor chosen by the forward regression does not have to be the same as that suggested by the forced entry regression. (4 marks)
  • From the forced entry regression, state the model equation and predict the glucose level of a person with Duration = 141, Fatalism = 19,

Knowledge = 36 and Schooling = 1.

(10 marks)

  • Using the forward regression output, what conclusions do you come to about the use of the 4 predictors to predict blood Glucose level.
    • marks)
  • Why, from looking at the definitions of the four predictor variables would you perform further analyses and what would they be?

(4 marks)


Variables Entered/Removed b

Variables Variables

Model Entered Removed Method

K, D, S, F

  1. All requested variables entered.
  2. Dependent Variable: G

Model Summary

Adjusted R Std. Error of

Model R R Square Square the Estimate


  1. Predictors: (Constant), K, D, S, F


Model Sum of Squares df Mean Square F Sig.

1 Regression 5557.055 4 1389.264 4.195 .004a

Residual 20863.710 63 331.170

Total 26420.765 67

  1. Predictors: (Constant), K, D, S, F
  2. Dependent Variable: G


1 (Constant) 130.385 6.983 .000 F .028 .485 .008 .057 .955

D -.053 .028 -.225 -1.890 .063

S -11.283 4.850 -.284 -2.327 .023

K -.731 .364 -.267 -2.008 .049

  1. Dependent Variable: G


Variables Entered/Removeda

Variables Variables

Model Entered Removed Method

1 Forward


K . Probability-of-

F-to-enter <= .050)

  1. Dependent Variable: G

Model Summary

Adjusted R Std. Error of

Model R R Square Square the Estimate


  1. Predictors: (Constant), K


Model Sum of Squares df Mean Square F Sig.

1 Regression 3067.774 1 3067.774 8.670 .004a

Residual 23352.990 66 353.833

Total 26420.765 67

  1. Predictors: (Constant), K
  2. Dependent Variable: G


1 (Constant) 126.438 11.056 .000

K -.932 .317 -.341 -2.945 .004

  1. Dependent Variable: G

Excluded Variables b

Model Beta In t Sig.

Partial Correlation

Collinearity Statistics


1 F -.023a -.173 .863 D -.162a -1.401 .166

S -.231a -1.920 .059







a. Predictors in the Model: (Constant), K

b. Dependent Variable: G