Assessment 1 - Evaluating Household Data
Data set :. This includes information about 2000 households across the following variables.
These are the different variables we have to consider.
There are 15 different variables we will consider for the study.
We have used here random number generation method. Due to this method bias will get reduced and we get better results.
A random number generator (RNG) is a device that generates a sequence of numbers or symbols that cannot be reasonably predicted better than by a random chance.
We have used Uniform random number generation method. So the random variable will be lies between 0 and 1.
Now we have to find descriptive statistics of each variable in the data set. Also we have to draw boxplot of each variable.
The descriptive statistics are in excel file (sheet 3).
Box plot of Alcohol, Meals, Fuel and Phone :
From descriptive statistics we can say that,
Average annual expenditures on alcohol in AUD is higher than that of meals, fuel and phone.
"Skewness assesses the extent to which a variable’s distribution is symmetrical.
Kurtosis is a measure of whether the distribution is too peaked (a very narrow distribution with most of the responses in the center).
We can see that skewness coefficient for all the four variables is greator than 0 so distribution is positively skewed.
For Alcohol kurtosis = 2.86 < 3 the distribution is platykurtic.
For meals, fuel and phone the kurtosis coefficients are 8.31, 8.43 and 43.57 respectively which are greator than 3 so the distribution for all three is leptokurtic.
Interpretation of boxplot :
From all the boxplots we can see that some points are outside the boxplot. They seems to be outlier.
In all the four variables there are outliers present in the sample implies population is also contains outliers.
Here interest of variable is Utilities.
We have to construct frequency distribution of the expenditures on Utilities.
We have to construct frequency distribution having 11 classes.
The classes are 0-300, 300-600, ............., 2700-3000, More than 3000.
First arrange the data of Utilities in ascending order.
Now we have to find frequency for each class.
Frequency of the class is the number of observations in the particular class.
Class 0-300 contains observations between 0 and including 299.
Class 300-600 contains observations between 300 and including 599 and so on.
In this way we will complete frequency distribution.
The frequency distribution of Utility expenditure is,
Classes |
frequency |
0-300 |
16 |
300-600 |
33 |
600-900 |
51 |
900-1200 |
36 |
1200-1500 |
38 |
1500-1800 |
30 |
1800-2100 |
20 |
2100-2400 |
10 |
2400-2700 |
5 |
2700-3000 |
1 |
More than 3000 |
10 |
Totals |
250 |
To find P(Percentage of households who spend on Utilities ≤ $900).
= P(0-300 or 300-600 or 600-900)
= P(0-300 class) + P(300-600 class) + P(600-900 class)
16/250 +33/250 + 51/250 = 100/250 =0.4 = 0.4*100 = 40%
To find P(Percentage of households who spend on Utilities between $1500 and $2700).
= P(1500-1800 or 1800-2100 or 2100-2400 or 2400-2700)
= P(1500-1800) + P(1800-2100) + P(2100-2400) + P(2400-2700)
30/250 + 20/250 + 10/250 + 5/250 = 65/250 = 0.26*100 = 26%
To find P(Percentage of households who spend on Utilities more than $3000).
= P(more than 3000)
= 10/250 =0.04 = 0.04*100 = 4%
Here our interest of variable is households annual after tax income (AtaxInc).
Let X be the random variable that value of the households annual after tax income.
Here we need to find descriptive statistics for AtaxInc.
From the descriptive statistics :
X ~ N(µ= 60113.04 , σ = 41293.33)
Top 5% we can write symbolically as,
P(X > x) = 5% = 0.05
1 – P(X≤ x) = 0.05
P(X ≤ x) = 1 – 0.05
P(X ≤ x) = 0.95
Now by using EXCEL,
Z = 1.645
Now we can find x by using formula,
X = µ+ z*σ = 60113.04 + 1.645*41293.33 = $128034.5
Thus, your AtaxInc expenditure needs to be $128034.5 or higher .
So 5% of the sample has a expenditure higher than $128034.5
Bottom 5% we can write symbolically as,
P(X < x) = 5% = 0.05
Z = -1.645
X = 60113.04 + 1.645*41293.33 = $-7808.45
Thus, your AtaxInc expenditure needs to be $128034.5 or less .
So 5% of the sample has a expenditure lower than $-7808.45
Here interest of variable is Ownhouse.
It contains two numbers 1 and 0.
1 : if a household owns a house
0 : if a household doesn’t owns a house
This is qualitative variable because yes or no type data is present for Ownhouse.
(ii) What would be the probability distribution of this random variable if we choose randomly (a) Only 1 household? (b) 250 households? Provide any relevant condition(s) to justify your answer.
Let X be a random variable such that X = Number of households who own a house.
It will take two values 1 and 0.
Now we have to find probability for each outcomes.
X |
f |
P |
0 |
73 |
0.292 |
1 |
177 |
0.708 |
250 |
1 |
Probability distribution of X is,
x |
0 |
1 |
total |
p |
0.292 |
0.705 |
1 |
P(only 1 household) = 1/250 = 0.004
P(250 households) = 250/250 = 1
Dependent variable y = ln (Texp)
Independent variable x = ln(ATaxInc)
This is the problem of simple linear regression.
By using excel we get following scatter plot.
Correlation coefficient (r) = 0.7145
Correlation coefficient have positive sign so there is positive relationship between two variables.
From the scatter plot we can say that there is positive relationship between natural logarithm of Texp and natural logarithm of ATaxInc.
Here our interest of variable is gender and the level of education.
Gender has two levels male and female.
And level of education (Highest degree) has primary, secondary, intermediate, bachelors and master.
Now we have to complete contingency table of the data.
Highest degree |
||||||
Gender |
P |
S |
I |
B |
M |
total |
Male |
25 |
34 |
23 |
23 |
33 |
138 |
Female |
24 |
25 |
20 |
26 |
17 |
112 |
total |
49 |
59 |
43 |
49 |
50 |
250 |
To find P(male and I)
P(maleandI)= numberofhouseholdsaremaleandlevelofeducationisI/samplesize = 23/250
P(male and I) = 0.092
To find P(female and B).
P(femaleandB) = numberofhouseholdsarefemaleandlevelofeducationisB/samplesize = 26/250
P(female and B) = 0.1040
To find P(S and male).
P(Sandmale) = numberofhouseholdswhoareSandmale/samplesize = 34/250 = 0.1360
The events are said to be independent iff
P(female * Master degree) = P(female) * P(Master degree)
17/250 = 112/250 * 50/250
17/250 56/625
0.0680 ≠ 0.0896
The events "gender of household head is female" and "having the Master Degree" dependent events.
Assignment Writing Help
Engineering Assignment Services
Do My Assignment Help
Write My Essay Services