R Language Assignment Question

Assignment 2

Background to question 1: This month, U of T marks the centenary of the Armistice, an event whose calamitous aftermath was described by alumna Prof Margaret MacMillan (above), among others.

In later work, Harvard professor Erica Chenoweth and others used statistical methods to show how nonviolence is strategically superior to violence. She has shared with our STA302 className a dataset regarding nonviolent campaigns up to 2006. After some editing by your instructor it contains:

Predictor , the population of a location on Earth
Response , the peak membership in a successful or partly successful nonviolent campaign

The units for and are: number of persons. The data are here, and you can load them in R using read.csv(“A2.csv”,sep=”,”) or similar.

Apply this flowchart to A2.csv. At each stage in your work, identify which part of the flowchart you’re in (for example, using headings). The question you seek to answer is whether there may exist a relationship between the predictor and the response variable. If you suspect a relationship, state it as an expression. Hints:

If you encounter step G or L, you may ignore it and continue to J or M respectively.

If you encounter step F then you may answer the question, or simply recall from className that an alternative to deciding whether interesting points are valid is to later report results with and without them.

Q2(a) (3 points)

You have a dataset for which SLR under the Gauss-Markov conditions is appropriate. You’re given sigma2, the variance of the model error term, and x, a vector of predictor variables. Write R code to find and from the lectures, and thence the covariance matrix of the residuals. Save your answer as an R matrix named V. Hint:

The diag command produces an identity matrix. Q2(b) (1 point)

Use your work from Q2(a) to calculate V for sigma2=2 and x=seq(1,4). Q3-BONUS (2 points)

Preamble: This challenge question is optional and you can achieve 20/20 on the assignment without doing it. The bonus given will generally be 0 or 2. Your two assignments can’t collectively contribute more than 10 percentage points to your final grade.

Derive an equation for the maximum number of leverage points possible in a dataset of size . Show your workings/thoughts, using equations/sentences to describe what a dataset would look like in order to reach this maximum. Your answer will be in terms of only — don’t include in your final expression other variables such as values, values etc. Hints:

You may use floor/ceiling notation if needed: or

Don’t assume that the data were generated according to any of our commonly used assumptions (Gauss-Markov conditions, etc).