EC226 Econometrics Assignment 1

Report Length: The maximum length for the text of the assessment is EIGHT sides of A4 paper (1.5 line spacing, 12 point, Times New Roman font with a 2.5cm margin all around). Detailed results (regression output, graphs, etc.) relating to you answers to question 1, 2, 3 and 4 should be put in an appendix. The appendix must be a MAXIMUM of TEN sides. You should make reference the appropriate page of the appendix from the main text when referring to your results in the appendix. Please note that both the main text and the appendix must be submitted as ONE document.

This assignment uses the 1970 British Cohort Study (BCS70), which is a continuing, longitudinal study which takes as its subjects all those living in England, Scotland and Wales who were born in one particular week in April 1970. Since 1970 there have been six attempts to gather information from the whole cohort. With each successive attempt, the scope of enquiry has broadened from a strictly medical focus at birth, to encompass physical and educational development at the age of five, physical, educational and social development at the ages of ten and sixteen, and then to include economic development and other wider factors at 26, 29 and 34 years.

The data file bcs70-assess1_1516.dta contains the results from all four quarters of the survey together for the main household respondent and can be found on the EC226 website

The documentation for this survey can be found here:


The main focus of this assignment is on the determination of gross hourly wages of individuals (using the variable hr_gro). Ensure that ALL regression analysis undertaken in Q2 and Q3 uses a constant sample size. In all cases you should be using Robust Standard Errors.


  1. Briefly describe the main features of the variable(s) of interest.

[15 marks]

[Hint: Consider first descriptive statistics, for example, means, medians, variances/standard deviations, plot of conditional means across various groups etc. As the relevance of certain variables might not become clear until the rest of the work has been completed, it is probably best to do a brief preliminary analysis of the data to aid your feel for how the variables relate to ln(hr_gro) [natural logarithm of hr_gro], but to actually undertake and fully answer this question after the rest of the assessment has been finished.]

2(a) Estimate a regression of ln(hr_gro) on gender (sex) and a series of binary dummy variables based on the highest qualification variable edu_lev (taking the lowest qualification, less than A-levels, as the default). Comment on your results.

2(b) Take the model in 2(a) and include measures of IQ (bas_quant, which is the score the individual obtained in a quantitative tests taken at the age of 10, with missing values recorded as zero). Comment on your results.

2(c) The model estimated in (2b) assumes that the returns to the different educational qualifications are the same for males and females. Include a series of interaction between gender and qualifications. Test the hypothesis imposed in (2b). Comment on your results?

2(d) Estimate the model in (2b) in which you permit all coefficients to differ by gender. Test this model against that model specified in 2(b) and also that specified in 2(c). Comments on your results.

2(e) In model 2(b) we assume that returns to an undergraduate degree are equivalent for all individuals, but signalling theory suggests that individuals are paid differently based on their classification of the degree (first class, upper second, lower second etc). Extend the model in 2(a) to allow the degree class obtained in the undergraduate degree to affect earnings (using the variable eddeg and choosing lower second class degree as the default). Is there any evidence that degree class matters? Interpret your results, including an interpretation of the coefficients on the variable for the undergraduate degree and the first class degree.

[18 marks]

3(a) There is a belief among some people that there is a quantity-quality trade-off with the number of children in the family you grew up in and the resultant attention that any individual child can therefore expect to receive from their parents. Some people believe this has an effect on earnings. Include additional variable(s) into the model in question 2(b) to allow for sibling effects on the model. Comment on your findings.

3(b) Consider a version of the model in 2(b) in which you only include a dummy variable for UG degree (including PG degree) from the highest qualification level variable edu_lev and estimate this model by OLS. You are worried about the endogeneity of the dummy variable for a UG degree. Estimate an OLS regression for the variable whether an individual has a first degree, against some appropriate instruments and discuss the validity or otherwise of the instruments. Compare the IV results from this model with the OLS results and comment on your findings.

[12 marks]

  1. Taking into account of what you have learnt about the data and looking at all the variables available to you in the data set, formulate your statistically preferred model for ln(hr_gro). This should involve investigating the impact of other variables not used so far, and also considering different functional forms (i.e. it may not be the case that there is a simple linear relationship between ln(hr_gro) and your explanatory variables). Write a report on your modelling strategy and on the findings from your model. Describe, justify and comment on your preferred model, including a critical assessment of the results.

[55 marks]

Hint: In formulating your own model it must make both economic and statistical sense. Variables should be statistically significant and the sign of them explainable in terms of basic economic theory. It is important to develop a hypothesis, which you are particularly interested in testing, and then to write your answer around this central idea.

In explaining and assessing your model and results, it might be useful to refer to previous work in the area of earnings equation, which you can do with a suitable literature search. However, you should be able to develop the multiple regression model using your own understanding of wage determination.

You should use EconLit on the Library website to search for relevant literature, which will help you put your model in context of other models in the literature. Your model in 4 should then inform the discussion of the data in Question 1.