MAE256 T2 Sample Assignment
MAE256 T2 – ASSIGNMENT
Answers
Q.1 The descriptive statistics of house sale prices is presented in the Table 1. The output was generated using excel.
Table 1Descriptive statistics of house sale price in thousands of dollars.
price | |
Mean |
804.8811 |
Standard Error |
4.336317 |
Median |
798.9556 |
Mode |
842.4683 |
Standard Deviation |
137.1264 |
Sample Variance |
18803.64 |
Kurtosis |
-0.66476 |
Skewness |
0.090698 |
Range |
685.3633 |
Minimum |
436.527 |
Maximum |
1121.89 |
Sum |
804881.1 |
Count |
1000 |
Confidence Level (95.0%) |
8.509334 |
From Table 1, it can be seen that the average house sale price in thousands of dollars is $804.88. The median house sale price is $798.95 thousand dollar. The spread of the sale prices can be assessed from the standard deviation which is $137.13 thousand dollar. Therefore, the spread of the data is high. The skewness of the data is very low which is 0.09. Therefore, it can be commented that the data is symmetrical and follows normal distribution. It can also be seen from the difference between mean and media, which are nearly the same considering the size of data. Therefore, the spread of the data is high whereas skewness of the house prices is nearly symmetrical.
Q2. The mean sale price of house is 804.11 and the standard deviation is 137.13. The house price which is one standard deviation away from the mean would be:
µ±σ = [804.11 – 137.13, 804.11 + 137.13] = [666.98, 941.24]. Therefore, as it is known that proportion with one standard deviation away from mean is nearly 68.5%. Hence, 68.5% of the prices are between $668.98 thousand and $941.24 thousand.
Q.3 The Figure 1 presets the scatter plot of price against the size. It is generated using Excel. It represents size on x-axis and price on y-axis.
Figure 1 Scatter plot between price and size
As it can be seen from the figure that there is a positive relationship between house sale prices and size of the house. It can be interpreted that as the size of house in square meters increases the sale price of house also increases.
Figure 2 presents the scatter plot between the proximity and price. Proximity is a dummy variable which shows whether the house is located near major business district or not. If house is near a major business district then it takes the value of 1 or 0 otherwise. The proximity is represented on x-axis and sale price is represented on y-axis.
Figure 2 Scatter plot between price and proximity
It can be seen from the above figure, that as the house is more near to the major business district, the price will be higher. Therefore, it can be stated that the proximity of house also affect the sale prices positively. In conclusion, both size and proximity are positively related to the sale prices of house.
Q.4 The linear regression of below stated model is performed using excel,
In the above model, the dependent variable is prices which is regressed with explanatory variables of size, age and proximity. It should be noted that the proximity is dummy variable. The excel output of the above model is represented in Figure 3.
Table 2 Regression Output
SUMMARY OUTPUT | ||||
Regression Statistics | ||||
Multiple R |
0.93093204 | |||
R Square |
0.866634463 | |||
Adjusted R Square |
0.86623276 | |||
Standard Error |
50.15287793 | |||
Observations |
1000 | |||
ANOVA | ||||
df |
SS |
MS |
F | |
Regression |
3 |
16279587.45 |
5426529 |
2157.399 |
Residual |
996 |
2505249.92 |
2515.311 | |
Total |
999 |
18784837.37 | ||
Coefficients |
Standard Error |
t Stat |
P-value | |
Intercept |
25.59282925 |
14.00464211 |
1.827453 |
0.067931 |
size |
3.890134007 |
0.078081854 |
49.82123 |
1E-272 |
age |
-0.601560515 |
0.168471364 |
-3.5707 |
0.000373 |
proximity |
195.8438204 |
3.176667127 |
61.65072 |
0 |
Therefore, our estimated model is:
The adjusted R^{2} of the model is 0.866, which is a measure of goodness of fit. The intercept of our model is the base price of the sale price. It is the average price level which is not dependent on any of the explanatory variables. The estimates of slope parameters of size, age and proximity are 3.8901, -0.6016 and 195.8438 respectively. The estimate of slope coefficient of size can be interpreted as the sale price is increased by $3.89 thousand if size increases by 1 square meters, keeping other variables constant. Similarly, if the age of the house is increased by 1 year the sale prices are decreased by $-0.6015 thousand while keeping the effect of size and proximity constant. If the house is near major business district then the proximity variable will take the value of one, therefore, the prices of house which are near business districts are on 195.8438 thousand more as compared to the houses which are not near the business district keeping other factors constant. The results are expected as, the houses with more size and which are near business districts will costs more. The age of house is also very important factor, the more old the age of the house, the prices of house will decrease. It is also important to note that at 5% only size, age and proximity estimates are significant.
Q.5 After adding two more variables of swimming pool and fire place into our previous model, the following model is obtained.
Both the variables that are introduced into the model are dummy variable. Pool variable takes the value of 1 if house has swimming pool or 0 otherwise. Similarly, fireplace takes the value of 1 if house has fireplace or 0 otherwise. The regression was performed using excel, the output is represented in Table 3.
The R^{2} of model before introduction of pool and fireplace variable was 0.8666 and the R^{2} after the introduction of the said variable is 0.8686. As it can be seen that R^{2} of the model is increased very little. Therefore, the introduction of the model did not increase the ‘goodness of fit’ of the model. It is better to also consider the value of adjusted R^{2} as it penalize the model for the introduction of new variables. The adjusted R^{2} before introducing new variable is 0.8662 and after introducing new variable the value of adjusted R^{2 }is 0.8679. Therefore, introduction of adding new variable has increased the goodness of fit of the model but with a very modest amount. Nevertheless, the variation explained by the model has increased by introduction of two new variables.
Table 3 Regression output
SUMMARY OUTPUT | ||||
Regression Statistics | ||||
Multiple R |
0.9319731 | |||
R Square |
0.868573859 | |||
Adjusted R Square |
0.867912761 | |||
Standard Error |
49.83694429 | |||
Observations |
1000 | |||
ANOVA | ||||
df |
SS |
MS |
F | |
Regression |
5 |
16316018.68 |
3263204 |
1313.837 |
Residual |
994 |
2468818.691 |
2483.721 | |
Total |
999 |
18784837.37 | ||
Coefficients |
Standard Error |
t Stat |
P-value | |
Intercept |
22.46361047 |
13.94043772 |
1.611399 |
0.10741 |
size |
3.880094147 |
0.077978005 |
49.75883 |
4.3E-272 |
age |
-0.627220786 |
0.1675913 |
-3.74256 |
0.000193 |
proximity |
195.6377575 |
3.157476777 |
61.96016 |
0 |
pool |
14.1458518 |
3.917097047 |
3.61131 |
0.00032 |
fireplace |
4.546132296 |
3.174622828 |
1.432023 |
0.152452 |
Q6. After transforming the price and size variables into natural logarithms. The following model is obtained:
The regression was performed using excel and the output is presented in Table 4. The estimated model is following;
Table 4 Regression output
SUMMARY OUTPUT | |||||
Regression Statistics | |||||
Multiple R |
0.928131596 | ||||
R Square |
0.86142826 | ||||
Adjusted R Square |
0.860731219 | ||||
Standard Error |
0.064934719 | ||||
Observations |
1000 | ||||
ANOVA | |||||
df |
SS |
MS |
F |
Significance F | |
Regression |
5 |
26.05462137 |
5.210924 |
1235.836 |
0 |
Residual |
994 |
4.191218689 |
0.004217 | ||
Total |
999 |
30.24584006 | |||
Coefficients |
Standard Error |
t Stat |
P-value | ||
Intercept |
2.175009803 |
0.090650516 |
23.99335 |
1E-100 | |
log(size) |
0.847233288 |
0.017580409 |
48.1919 |
2.5E-262 | |
age |
-0.000873122 |
0.000218377 |
-3.99823 |
6.85E-05 | |
proximity |
0.247529504 |
0.004113783 |
60.17077 |
0 | |
pool |
0.018570706 |
0.00510377 |
3.638625 |
0.000288 | |
fireplace |
0.007021951 |
0.004136736 |
1.697462 |
0.089922 |
The slope coefficient of log(size), represents the percentage change in percentage change in price due to one percent change in size. Therefore, it can be interpreted as the sale price of houses increased by 84.72% or nearly 85% when there is one percent increase in size of the house. The estimate of represents the percentage difference in house of sale prices when there is availability of swimming pool in the house keeping size, proximity, age and fireplace constant. Therefore, it can be interpreted as the sale prices of houses with swimming pool availability will be nearly 1.86% more as compared to prices of house which does not have swimming pool.
Q7. Null Hypothesis: H_{0}: Fireplace does not influence the house prices: H_{0}:
Alternate hypothesis: H_{1} Fireplace influence the house prices: H_{a}: .
It can be seen from Table 4 that p-value of the estimate of fireplace is 0.08 which is far greater than the 0.01 level of significance. Therefore, at 1% level of significance, cannot reject the null hypothesis as p-value is greater than significance level. Hence, it can be stated that there are no enough evidence to reject the claim that fireplace does not influence the house prices. However, at 10% level of significance, the p-value is smaller than 0.10, therefore, we can state that there are enough evidences to reject the null hypothesis. Hence, it can be stated at 10% level of significance that fireplace does influence the sale prices of house.
Q.8 Null Hypothesis: all the variables in the model are not significant.
Alternate hypothesis: at least one coefficient is different from zero.
According to the output generated in Table 4, it can be seen that p-value of F statistic is given by 0 which is lower than the 0.05 level of significance. Hence, there are enough evidences to reject the null hypothesis. Therefore, it can be concluded that the variables are jointly significant. Hence at least one variable from size, proximity, age, pool and fireplace affects the sale prices of house.