Dummy Variable in Multiple Regression STATA Tutorial


In this tutorial, we have developed a step-wise procedure for using STATA statistical software for dummy variable based multiple linear regressions.

Let us consider a sample regression and data analysis problem in which we are given a data set for TESTING FOR DISCRIMINATION IN THE US. The data set consists of a quantitative dependent variable which is wages. The wages data has been normalized by converting it into its log value. That means the dependent variable here is LNWAGE or the natural log of wages earned by respondents.

The data is a random sample from May 1985 Current Population Survey conducted by the US Census Bureau. It contains observations on 12 variables for 534 individuals. The first variable in the data set is years of schooling (EDU), and the next six entries are 0-1 dummy variables taking on the value 1 if the individual resides in the south (SOUTH), is non-white and non-Hispanic (NONWH), is Hispanic (HISP), is female (FE), is married (MAR) and is female married (MARFE). The next two variables measure potential years of experience (EX), computes as age minus years of schooling minus 6, and this potential experience measure squared (EXSQ). The next entry is a dummy variable taking on the value 1 if the individual works at a union job (UNIO). The next column is the natural logarithm of the individuals average hourly in-dollars earnings (LNWAGE).