MAE256 T2 – ASSIGNMENT
Answers
Q.1 The descriptive statistics of house sale prices is presented in the Table 1. The output was generated using excel.
Table 1Descriptive statistics of house sale price in thousands of dollars.
price 

Mean 
804.8811 
Standard Error 
4.336317 
Median 
798.9556 
Mode 
842.4683 
Standard Deviation 
137.1264 
Sample Variance 
18803.64 
Kurtosis 
0.66476 
Skewness 
0.090698 
Range 
685.3633 
Minimum 
436.527 
Maximum 
1121.89 
Sum 
804881.1 
Count 
1000 
Confidence Level (95.0%) 
8.509334 
From Table 1, it can be seen that the average house sale price in thousands of dollars is $804.88. The median house sale price is $798.95 thousand dollar. The spread of the sale prices can be assessed from the standard deviation which is $137.13 thousand dollar. Therefore, the spread of the data is high. The skewness of the data is very low which is 0.09. Therefore, it can be commented that the data is symmetrical and follows normal distribution. It can also be seen from the difference between mean and media, which are nearly the same considering the size of data. Therefore, the spread of the data is high whereas skewness of the house prices is nearly symmetrical.
Q2. The mean sale price of house is 804.11 and the standard deviation is 137.13. The house price which is one standard deviation away from the mean would be:
µ±σ = [804.11 – 137.13, 804.11 + 137.13] = [666.98, 941.24]. Therefore, as it is known that proportion with one standard deviation away from mean is nearly 68.5%. Hence, 68.5% of the prices are between $668.98 thousand and $941.24 thousand.
Q.3 The Figure 1 presets the scatter plot of price against the size. It is generated using Excel. It represents size on xaxis and price on yaxis.
Figure 1 Scatter plot between price and size
As it can be seen from the figure that there is a positive relationship between house sale prices and size of the house. It can be interpreted that as the size of house in square meters increases the sale price of house also increases.
Figure 2 presents the scatter plot between the proximity and price. Proximity is a dummy variable which shows whether the house is located near major business district or not. If house is near a major business district then it takes the value of 1 or 0 otherwise. The proximity is represented on xaxis and sale price is represented on yaxis.
Figure 2 Scatter plot between price and proximity
It can be seen from the above figure, that as the house is more near to the major business district, the price will be higher. Therefore, it can be stated that the proximity of house also affect the sale prices positively. In conclusion, both size and proximity are positively related to the sale prices of house.
Q.4 The linear regression of below stated model is performed using excel,
In the above model, the dependent variable is prices which is regressed with explanatory variables of size, age and proximity. It should be noted that the proximity is dummy variable. The excel output of the above model is represented in Figure 3.
Table 2 Regression Output
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.93093204 

R Square 
0.866634463 

Adjusted R Square 
0.86623276 

Standard Error 
50.15287793 

Observations 
1000 

ANOVA 

df 
SS 
MS 
F 

Regression 
3 
16279587.45 
5426529 
2157.399 
Residual 
996 
2505249.92 
2515.311 

Total 
999 
18784837.37 

Coefficients 
Standard Error 
t Stat 
Pvalue 

Intercept 
25.59282925 
14.00464211 
1.827453 
0.067931 
size 
3.890134007 
0.078081854 
49.82123 
1E272 
age 
0.601560515 
0.168471364 
3.5707 
0.000373 
proximity 
195.8438204 
3.176667127 
61.65072 
0 
Therefore, our estimated model is:
The adjusted R^{2} of the model is 0.866, which is a measure of goodness of fit. The intercept of our model is the base price of the sale price. It is the average price level which is not dependent on any of the explanatory variables. The estimates of slope parameters of size, age and proximity are 3.8901, 0.6016 and 195.8438 respectively. The estimate of slope coefficient of size can be interpreted as the sale price is increased by $3.89 thousand if size increases by 1 square meters, keeping other variables constant. Similarly, if the age of the house is increased by 1 year the sale prices are decreased by $0.6015 thousand while keeping the effect of size and proximity constant. If the house is near major business district then the proximity variable will take the value of one, therefore, the prices of house which are near business districts are on 195.8438 thousand more as compared to the houses which are not near the business district keeping other factors constant. The results are expected as, the houses with more size and which are near business districts will costs more. The age of house is also very important factor, the more old the age of the house, the prices of house will decrease. It is also important to note that at 5% only size, age and proximity estimates are significant.
Q.5 After adding two more variables of swimming pool and fire place into our previous model, the following model is obtained.
Both the variables that are introduced into the model are dummy variable. Pool variable takes the value of 1 if house has swimming pool or 0 otherwise. Similarly, fireplace takes the value of 1 if house has fireplace or 0 otherwise. The regression was performed using excel, the output is represented in Table 3.
The R^{2} of model before introduction of pool and fireplace variable was 0.8666 and the R^{2} after the introduction of the said variable is 0.8686. As it can be seen that R^{2} of the model is increased very little. Therefore, the introduction of the model did not increase the ‘goodness of fit’ of the model. It is better to also consider the value of adjusted R^{2} as it penalize the model for the introduction of new variables. The adjusted R^{2} before introducing new variable is 0.8662 and after introducing new variable the value of adjusted R^{2 }is 0.8679. Therefore, introduction of adding new variable has increased the goodness of fit of the model but with a very modest amount. Nevertheless, the variation explained by the model has increased by introduction of two new variables.
Table 3 Regression output
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.9319731 

R Square 
0.868573859 

Adjusted R Square 
0.867912761 

Standard Error 
49.83694429 

Observations 
1000 

ANOVA 

df 
SS 
MS 
F 

Regression 
5 
16316018.68 
3263204 
1313.837 
Residual 
994 
2468818.691 
2483.721 

Total 
999 
18784837.37 

Coefficients 
Standard Error 
t Stat 
Pvalue 

Intercept 
22.46361047 
13.94043772 
1.611399 
0.10741 
size 
3.880094147 
0.077978005 
49.75883 
4.3E272 
age 
0.627220786 
0.1675913 
3.74256 
0.000193 
proximity 
195.6377575 
3.157476777 
61.96016 
0 
pool 
14.1458518 
3.917097047 
3.61131 
0.00032 
fireplace 
4.546132296 
3.174622828 
1.432023 
0.152452 
Q6. After transforming the price and size variables into natural logarithms. The following model is obtained:
The regression was performed using excel and the output is presented in Table 4. The estimated model is following;
Table 4 Regression output
SUMMARY OUTPUT 

Regression Statistics 

Multiple R 
0.928131596 

R Square 
0.86142826 

Adjusted R Square 
0.860731219 

Standard Error 
0.064934719 

Observations 
1000 

ANOVA 

df 
SS 
MS 
F 
Significance F 

Regression 
5 
26.05462137 
5.210924 
1235.836 
0 
Residual 
994 
4.191218689 
0.004217 

Total 
999 
30.24584006 

Coefficients 
Standard Error 
t Stat 
Pvalue 

Intercept 
2.175009803 
0.090650516 
23.99335 
1E100 

log(size) 
0.847233288 
0.017580409 
48.1919 
2.5E262 

age 
0.000873122 
0.000218377 
3.99823 
6.85E05 

proximity 
0.247529504 
0.004113783 
60.17077 
0 

pool 
0.018570706 
0.00510377 
3.638625 
0.000288 

fireplace 
0.007021951 
0.004136736 
1.697462 
0.089922 
The slope coefficient of log(size), represents the percentage change in percentage change in price due to one percent change in size. Therefore, it can be interpreted as the sale price of houses increased by 84.72% or nearly 85% when there is one percent increase in size of the house. The estimate of represents the percentage difference in house of sale prices when there is availability of swimming pool in the house keeping size, proximity, age and fireplace constant. Therefore, it can be interpreted as the sale prices of houses with swimming pool availability will be nearly 1.86% more as compared to prices of house which does not have swimming pool.
Q7. Null Hypothesis: H_{0}: Fireplace does not influence the house prices: H_{0}:
Alternate hypothesis: H_{1} Fireplace influence the house prices: H_{a}: .
It can be seen from Table 4 that pvalue of the estimate of fireplace is 0.08 which is far greater than the 0.01 level of significance. Therefore, at 1% level of significance, cannot reject the null hypothesis as pvalue is greater than significance level. Hence, it can be stated that there are no enough evidence to reject the claim that fireplace does not influence the house prices. However, at 10% level of significance, the pvalue is smaller than 0.10, therefore, we can state that there are enough evidences to reject the null hypothesis. Hence, it can be stated at 10% level of significance that fireplace does influence the sale prices of house.
Q.8 Null Hypothesis: all the variables in the model are not significant.
Alternate hypothesis: at least one coefficient is different from zero.
According to the output generated in Table 4, it can be seen that pvalue of F statistic is given by 0 which is lower than the 0.05 level of significance. Hence, there are enough evidences to reject the null hypothesis. Therefore, it can be concluded that the variables are jointly significant. Hence at least one variable from size, proximity, age, pool and fireplace affects the sale prices of house.
Assignment Writing Help
Engineering Assignment Services
Do My Assignment Help
Write My Essay Services