SO5041 Assignment 3

1 Comparing means

Download this small Stata dataset from

Use Stata to compare the mean income of men with that of women. Interpret the results in terms of the null hypothesis that mean income is the same in the two groups.

2 Using Stata for Regression

Using the same downloaded data set, do the following separately for men and women:

  • Create scatterplots of income (Y-axis) by hours (X-axis) Do you see association in these scatterplots?
  • Get the correlation coefficient, and say what it means
  • Run the regression analysis and write out the full Y =a+b×X equation
  • Draw the line over the scatterplot, and interpret what it means Having done this for both men and women, compare your results:
  • What are the predicted earnings for men and women who work 40 hours?
  • ...who work 20 hours?
  • For which sex does the regression work better? Why might this be so?

3 Work and TV

A sample of adults has been asked to record how much time they spend watching television in a given week. They have also been asked to record their work hours. Researchers feel that time spent working will affect time spent watching television.

  • Do you think this is a plausible idea? If so, how would the connection work and what direction do you think it would take?

To test this hypothesis, they conduct a regression analysis, with TV time in minutes per day as the dependent variable and Work time (in hours per week) as the explanatory variable. That is, TV =a+b×Work. They get the following results:

  • Constant: 37
  • Coefficient for hours work: -0.5
  • Draw this line on a graph, with TV time on the Y-axis, and Work time on the X-axis If someone works exactly 40 hours, what is his/her predicted time watching television?
  • If someone works 15 hours, how much television do they watch?
  • If someone works 168 hours, how much television do they watch? Is this a useful question to ask, and why?

4 Multiple regression

In the same project as the previous question, researchers had also asked whether the respondents had active outside interests (sport, membership of associations, etc.). They created a single summary variable, 0 to mean no outside interests, 1 for those who had such interests, and added that to the regression analysis. These were the new results

  • Constant: 35
  • Coefficient for hours work: -0.35
  • Coefficient for outside interest: -2.47
  1. Draw the two regression lines
  2. Interpret the regression coefficients in words
  3. Discuss the overall interpretation of this regression analysis, in comparison with the analysis without theoutside-interests variable

5 Multiple regression in Stata

Go back to the data used in the first question, and fit a regression for men and women together, with both hours and gender as independent variables. Interpret your findings, and compare them with what you found in the first question.

Note: the gender variable should have the values 0 and 1.