# Statistics Hypothesis Testing Assignment Help

## Sample: Hypothesis Testing Paper

One and Two (or more) Sample Hypothesis Testing Paper. Using data from one of the data sets available through the “Data Sets” link on your page, develop one business research question from which you will formulate a research hypothesis to test one population parameter and another to test two (or more) population parameters. Formulate both a numerical and verbal hypothesis statement regarding each of your research issue.

Perform **Hypotheses Tests** using the five step model. Describe and interpret the results of the test, both in statistical terms and in conversational English. Include appropriate descriptive statistics.**Solution:****Research question: To find whether there is a significant difference between wins and salary of the baseball players.**

There are two leagues denoted as

1 if American League and

0 if National League

We have separated the data set as **Data set****American League:**

Salary | alary -mil | Wins |

123505125.0 | 123.5 | 95.0 |

208306817.0 | 208.3 | 95.0 |

55425762.0 | 55.4 | 88.0 |

73914333.0 | 73.9 | 74.0 |

97725322.0 | 97.7 | 95.0 |

41502500.0 | 41.5 | 93.0 |

75178000.0 | 75.2 |
99.0 |

45.7 | 80.0 | |

56186000.0 | 56.2 | 83.0 |

29679067.0 | 29.7 | 67.0 |

55849000.0 | 55.8 | 79.0 |

69092000.0 | 69.1 | 71.0 |

87754334.0 | 87.8 | 69.0 |

36881000.0 | 36.9 | 56.0 |

## Statistics Hypothesis Testing Assignment Help Through Online Tutoring and Guided Sessions from AssignmentHelp.Net

**Claim:** There is a significant difference between wins and salary- mil of the baseball players in American League. **Hypotheses:****Null Hypothesis**:**Numerical Null Hypothesis**:**Verbal Null Hypothesis**:

There is no significant difference between wins and salary- mil of the baseball players in American League.**Alternative Hypothesis**:**Numerical Alternative Hypothesis**:**Verbal Alternative Hypothesis**:

There is a significant difference between wins and salary- mil of the baseball players in American League.**Level of Significance:**

α = 0.05**Decision rule:**

If the p value is greater than the given level of significance we may accept the null hypothesis. Otherwise reject the null hypothesis.**Test Statistic:**

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàHypothesis tests à Compare two independent groups

Hypothesis Test: Independent Groups (t-test, pooled variance) | |||||

Salary -mil | Wins | ||||

75.479 | 81.71 | mean | |||

45.930 | 13.07 | std. dev. | |||

14 | 14 | n | |||

26 | df | ||||

-6.2357 |
difference (Salary -mil - Wins) | ||||

1,140.1793 |
pooled variance | ||||

33.7665 | pooled std. dev. | ||||

12.7626 | standard error of difference | ||||

0 | hypothesized difference | ||||

-0.49 | t | ||||

.6292 | p-value (two-tailed) |

The test statistic value is -0.49.

The p value for the test statistic is 0.6292.

**Conclusion:**

Since the p value of test statistic is greater than 0.05 level of significance we may accept the null hypothesis H0 at 5% level of significance. Hence, we conclude that there is no significant difference between wins and salary- mil of the baseball players in American League.

**Research question: To find whether there is a significant difference between wins and salary of the baseball players.**

**Data set****National League:**

Salary | Salary -mil | Wins |

86457302.0 | 86.5 | 90.0 |

62329166.0 | 62.3 | 77.0 |

76799000.0 | 76.8 | 89.0 |

61892583.0 | 61.9 | 73.0 |

101305821.0 | 101.3 | 83.0 |

38133000.0 | 38.1 | 67.0 |

83039000.0 | 83.0 | 71.0 |

63290833.0 | 63.3 | 82.0 |

48581500.0 | 48.6 | 81.0 |

90199500.0 | 90.2 | 75.0 |

92106833.0 | 92.1 | 100.0 |

60408834.0 | 60.4 | 83.0 |

95522000.0 | 95.5 | 88.0 |

39934833.0 | 39.9 | 81.0 |

87032933.0 | 87.0 | 79.0 |

48155000.0 | 48.2 | 67.0 |

**Claim:** There is a significant difference between wins and salary- mil of the baseball players in National League.**Hypotheses:****Null Hypothesis**:**Numerical Null Hypothesis**:**Verbal Null Hypothesis**:

There is no significant difference between wins and salary- mil of the baseball players in National League.**Alternative Hypothesis**:**Numerical Alternative Hypothesis**:**Verbal Alternative Hypothesis**:

There is a significant difference between wins and salary- mil of the baseball players in National League.**Level of Significance:**

α = 0.05**Decision rule:**

If the p value is greater than the given level of significance we may accept the null hypothesis. Otherwise reject the null hypothesis.**Test Statistic:**

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàHypothesis tests à Compare two independent groups

Hypothesis Test: Independent Groups (t-test, pooled variance) | |||

Salary -mil | Wins | ||

70.949 | 80.375 | mean | |

20.669 | 8.831 | std. dev. | |

16 | 16 | n | |

df | |||

-9.4257 | difference (Salary -mil - Wins) | ||

252.5883 | pooled variance | ||

15.8930 | pooled std. dev. | ||

5.6190 | standard error of difference> | ||

0 | hypothesized difference | ||

-1.68 | t | ||

.1038 | p-value (two-tailed) |

The test statistic value is -1.68.

The p value for the test statistic is 0.1038.**Conclusion: **

Since the p value of test statistic is greater than 0.05 level of significance we may accept the null hypothesis H0 at 5% level of significance. Hence, we conclude that there is no significant difference between wins and salary- mil of the baseball players in National League.

**Regression analysis:**

The general multiple regression is given by

where, *y* is the dependent variable,

’s are independent variable,

is the actual constant,

is the actual coefficient associated with ith independent variable,

is the error term which models the unsystematic error of the *y*

The above model can be written in matrix form as

The General Goal of multiple regression is to determine which independent (explanatory) variables should be included in the model.

We want to first test each coefficient, where i=1,2,...,k, within the model, in order to determine if that individual parameter should be dropped from the model.

Next we test the goodness of fit of the model.

**Hypothesis Tests:****Procedure:**

First we estimate the model as

where, is the estimated value of and .

**For Testing Each ****:**

The test statistic is given by

where, is the standard error of the estimated coefficient . **Goodness of fit test: **

In order to test the goodness of fit test we generally compute R2, which lies between 0 and 1. As R2 tends to 1, we can say that the model is suitable for the data i.e. the model can explain the data very well.

**Dependent variable: **

X7- Wins**Independent variables:**

X2- League

X3- Built

X4- Size

X5- Surface

X6- Salary- mil

X8- Attendance

X9- Batting

X10- ERA

X11- HR

X12- Error

X13- SB

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàCorrelation/ Regression à Regression analysis

Regression Analysis | ||||||

R² |
0.857 | |||||

Adjusted R² | 0.770 | n |
30 | |||

R | 0.926 | k | 11 | |||

Std. Error | 5.200 | Dep. Var. | Wins | |||

ANOVA table | ||||||

Source | SS | df | MS | F | p-value | |

Regression | 2,917.2794 | 11 | 265.2072 | 9.81 | 1.64E-05 | |

Residual | 486.7206 | 18 | 27.0400 | |||

Total | 3,404.0000 | 29 | ||||

Regression output | confidence interval | |||||

variables | coefficients | std. error | t (df=18) | p-value | 95% lower | 95% upper |

Intercept | 74.6634 | 133.9145 | 0.558 | .5840 | -206.6805 | 356.0073 |

League | -1.2494 | 2.3275 | -0.537 | .5980 | -6.1392 | 3.6404 |

Built | -0.0274 | 0.0558 | -0.491 | .6291 | -0.1447 | 0.0899 |

Size | -0.00000401 | 0.00020556 | -0.019 | .9847 | -0.00043588 | 0.00042787 |

Surface | 0.5761 | 4.3135 | 0.134 | .8952 | -8.4863 | 9.6384 |

Salary -mil | 0.0411 | 0.0667 | 0.615 | .5462 | -0.0992 | 0.1813 |

Attendance | -0.00000085 | 0.00000317 | -0.267 | .7923 | -0.00000750 | 0.00000581 |

Batting | 447.7443 | 200.5131 | 2.233 | .0385 | 26.4819 | 869.0067 |

ERA | -13.6362 | 2.4171 | -5.642 | 2.37E-05 | -18.7143 | -8.5581 |

HR | 0.0930 | 0.0338 | 2.755 | .0130 | 0.0221 | 0.1639 |

Error | -0.1601 | 0.1246 | -1.285 | .2151 | -0.4218 | 0.1017 |

SB | 0.0152 | 0.0361 | 0.422 | .6777 | -0.0605 | 0.0910 |

**The regression equation is Wins = 74.6634 - 1.2494 League - 0.0274 Built - 0.00000401 Size + 0.5761 Surface + 0.0411 Salary -mil - 0.00000085 Attendance + 447.7443 Batting -13.6362 ERA + 0.0930 HR -0.1601 Error + 0.0152 SB **

The R-Sq(adj.) value is high. So the model has good fit. But the p-values for x2, x3, x4, x5, x6, x12 and x13 are greater than 0.05. So these coefficients are insignificant. There is thus a multicollinearity problem. So we drop these variables and regress x7 on x9, x10 and x11.

**Regression Analysis: x7 versus x9, x10, x11**

Dependent variable:

X7- Wins

Independent variables:

X9- Batting

X10- ERA

X11- HR

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàCorrelation/ Regression à Regression analysis

Regression Analysis | ||||||

R² | 0.810 | |||||

Adjusted R² | 0.788 | n | 30 | |||

R | 0.900 | k | 3 | |||

Std. Error | 4.988 | Dep. Var. | Wins | |||

ANOVA table | ||||||

Source | SS | df | MS | F | p-value | |

Regression | 2,757.1594 | 3 | 919.0531 | 36.94 | 1.60E-09 | |

Residual | 646.8406 | 26 | 24.8785 | |||

Total | 3,404.0000 | 29 | ||||

Regression output | confidence interval | |||||

variables | coefficients | std. error | t (df=26) | p-value | 95% lower | 95% upper |

Intercept | 1.8499 | 35.0214 | 0.053 | .9583 | -70.1376 | 73.8374 |

Batting | 492.4490 | 140.3025 | 3.510 | .0017 | 204.0532 | 780.8449 |

ERA | -15.9575 | 1.6753 | -9.525 | 5.78E-10 | -19.4011 | -12.5139 |

HR | 0.1035 | 0.0289 | 3.582 | .0014 | 0.0441 | 0.1628 |

**The regression equation is Wins = 1.8499 + 492.4490 Batting -15.9575 ERA + 0.1035 HR **

Here all the p values of the coefficients are less than 0.05 i.e. are significant at 5 % level of significance. The R2 value is slightly reduced after dropping the variables and it is of not that much effect and hence the model is good.

**Correlation:****Research question: To find whether salary have relationship with Attendance of the baseball players.**

There are two leagues denoted as

1 if American League and

0 if National League

We have separated the data set as **Data set****American League:**

Salary -mil | Attendance |

123.5 | 2,847,798 |

208.3 | 4,090,440 |

55.4 | 2,108,818 |

73.9 | 2,623,904 |

97.7 | 3,404,636 |

41.5 | 2,014,220 |

75.2 | 2,342,804 |

45.7 | 2,014,995 |

56.2 | 2,034,243 |

29.7 | 1,141,915 |

55.8 | 2,525,259 |

69.1 | 2,024,505 |

87.8 | 2,724,859 |

36.9 | 1,371,181 |

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàCorrelation/ Regression à Correlation Matrix

Correlation Matrix | ||

Salary -mil | Attendance | |

Salary -mil | 1.000 | |

Attendance | .895 | 1.000 |

14 | sample size |

The correlation coefficient between salary- mil and attendance is 0.895. there is a strong positive correlation exist between the variables.

**Null Hypothesis:**

H0: ρ=0

H0: “no linear relationship” between the variables.**Alternative Hypothesis:**

H1: ρ≠0

H1:“ linear relationship” between the variables.

**Level of significance:**

α = 0.05 **Critical value:**

At 5% level of significance t distribution with v = 14 - 2 degrees of freedom is 2.178813 **Test statistic:**

Under

has a t distribution with v = n-2 degrees of freedom.**r= 0.895 and n = 14**

**Conclusion:**

Since the test statistic value is greater than the critical value there is no evidence to accept the null hypothesis at 5% level of significance. Hence we conclude that there is a relationship exist between the variables salary- mil and attendance.

**Data set****National League:**

Salary -mil | Attendance |

86.5 | 2,520,904 |

62.3 | 2,059,327 |

76.8 | 2,805,060 |

61.9 | 1,923,254 |

101.3 | 2,827,549 |

38.1 | 1,817,245 |

83 | 3,603,680 |

63.3 | 2,869,787 |

48.6 | 2,730,352 |

90.2 | 3,181,020 |

92.1 | 3,542,271 |

60.4 | 1,852,608 |

95.5 | 2,665,304 |

39.9 | 2,211,323 |

87 | 3,100,092> |

48.2 | 1,914,385 |

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàCorrelation/ Regression à Correlation Matrix

Correlation Matrix | ||

Salary -mil | Attendance | |

Salary -mil | 1.000 | |

Attendance | .693 | 1.000 |

16 | sample size |

The correlation coefficient between salary- mil and attendance is 0.693. There is a strong positive correlation exist between the variables.**Null Hypothesis:**

H0: ρ=0

H0: “no linear relationship” between the variables.**Alternative Hypothesis:**

H1: ρ≠0

H1:“ linear relationship” between the variables. **Level of significance:**

α = 0.05 **Critical value:**

At 5% level of significance t distribution with v = 16 - 2 degrees of freedom is 2.144787 **Test statistic:**

Under

has a t distribution with v = n-2 degrees of freedom.** r= 0.693 and n = 16**

**Conclusion:**

Since the test statistic value is greater than the critical value there is no evidence to accept the null hypothesis at 5% level of significance. Hence we conclude that there is a relationship exist between the variables salary- mil and attendance.**Descriptive Statistics:**

Using **Megastat** in Microsoft Excel Add- Ins:

Add- Ins à MegastatàDescriptive Statistics

Salary -mil | Wins | Attendance | Batting | ERA | HR | Error | SB > | |

count | 30 | 30 | 30 | 30 | 30> | 30 | 30 | 30 |

mean | 73.064 | 81.000 | 2,496,457.93 | 0.26443 | 4.2847 | 167.23 | 102.00 | 85.50 |

sample variance | 1,171.965 | 117.379 | 452,766,738,769.44 | 0.00005 | 0.3206 | 1,225.29 | 130.34 | 1,075.43 |

sample standard deviation | 34.234 | 10.834 | 672,879.44 | 0.00728 | 0.5662 | 35.00 | 11.42 | 32.79 |

minimum | 29.679067 | 56 | 1141915 | 0.252 | 3.49 | 117 | 86 | 31 |

maximum | 208.30682 | 100 | 4090440 | 0.281 | 5.49 | 260 | 125 | 161 |

range | 178.62775 | 44 | 2948525 | 0.029 | 2 | 143 | 39 | 130 |

1st quartile | 50.293 | 73.250 | 2,017,372.50 | 0.25900 | 3.7875 | 136.75 | 92.50 | 65.25 |

median | 66.191 | 81.000 | 2,523,081.50 | 0.26400 | 4.2000 | 164.00 | 102.50 | 76.00 |

3rd quartile | 87.574 | 88.750 | 2,842,735.75 | 0.27000 | 4.5500 | 190.50 | 108.75 | 101.25 |

interquartile range | 37.281 | 15.500 | 825,363.25 | 0.01100 | 0.7625 | 53.75 | 16.25 | 36.00 |

mode | #N/A | 95.000 | #N/A | 0.27000 | 3.6100 | 130.00 | 106.00 | 45.00 |

The descriptive statistics for the whole team is given in the above table.

**Inference for our research:**

- From the analysis of comparing two independent groups we obtain the result as there is no significant difference between wins and salary- mil of the baseball players in American League.
- From the regression analysis we obtained the regression equation predicting the wins is

**Wins = 1.8499 + 492.4490 Batting -15.9575 ERA + 0.1035 HR**

- From the correlation analysis we obtained the result as there is a relationship exists between the variables salary- mil and attendance of baseball players in American League.
- From the correlation analysis we obtained the result as there is a relationship exists between the variables salary- mil and attendance of the baseball players in National League.