Statistics Hypothesis Testing

Sample: Hypothesis Testing Paper

One and Two (or more) Sample Hypothesis Testing Paper. Using data from one of the data sets available through the “Data Sets” link on your   page, develop one business research question from which you will formulate a research hypothesis to test one population parameter and another to test two (or more) population parameters.  Formulate both a numerical and verbal hypothesis statement regarding each of your research issue.

Perform Hypotheses Tests using the five step model. Describe and interpret the results of the test, both in statistical terms and in conversational English. Include appropriate descriptive statistics.

Solution:
Research question: To find whether there is a significant difference between wins and salary of the baseball players.

There are two leagues denoted as
1 if American League and
0 if National League
We have separated the data set as
Data set
American League:

Salary

Salary -mil

Wins

123505125.0

123.5

95.0

208306817.0

208.3

95.0

55425762.0

55.4

88.0

73914333.0

73.9

74.0

97725322.0

97.7

95.0

41502500.0

41.5

93.0

75178000.0

75.2

99.0

45719500.0

45.7

80.0

56186000.0

56.2

83.0

29679067.0

29.7

67.0

55849000.0

55.8

79.0

69092000.0

69.1

71.0

87754334.0

87.8

69.0

36881000.0

36.9

56.0


Claim: There is a significant difference between wins and salary- mil of the baseball players in American League.
Hypotheses:
Null Hypothesis:
Numerical Null Hypothesis:
spss help   
Verbal Null Hypothesis:
statistics tutor There is no significant difference between wins and salary- mil of the baseball players in American League.
Alternative Hypothesis:
Numerical Alternative Hypothesis:
business statistics help   
Verbal Alternative Hypothesis:
college statistics help There is a significant difference between wins and salary- mil of the baseball players in American League.
Level of Significance:
 α = 0.05
Decision rule:
If the p value is greater than the given level of significance we may accept the null hypothesis. Otherwise reject the null hypothesis.
Test Statistic:
                        statistics tutor 

 

Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàHypothesis tests à Compare two independent groups

Hypothesis Test: Independent Groups (t-test, pooled variance)

Salary -mil

Wins

75.479

81.71

mean

45.930

13.07

std. dev.

14

14

n

26 

df

-6.2357

difference (Salary -mil - Wins)

1,140.1793

pooled variance

33.7665

pooled std. dev.

12.7626

standard error of difference

0

hypothesized difference

-0.49

 t

.6292

 p-value (two-tailed)


The test statistic value is -0.49.

The p value for the test statistic is 0.6292.


Conclusion:
Since the p value of test statistic is greater than 0.05 level of significance we may accept the null hypothesis H0 at 5% level of significance. Hence, we conclude that there is no significant difference between wins and salary- mil of the baseball players in American League.  
Research question: To find whether there is a significant difference between wins and salary of the baseball players.

Data set
National League:


Salary

Salary -mil

Wins

86457302.0

86.5

90.0

62329166.0

62.3

77.0

76799000.0

76.8

89.0

61892583.0

61.9

73.0

101305821.0

101.3

83.0

38133000.0

38.1

67.0

83039000.0

83.0

71.0

63290833.0

63.3

82.0

48581500.0

48.6

81.0

90199500.0

90.2

75.0

92106833.0

92.1

100.0

60408834.0

60.4

83.0

95522000.0

95.5

88.0

39934833.0

39.9

81.0

87032933.0

87.0

79.0

48155000.0

48.2

67.0


Claim: There is a significant difference between wins and salary- mil of the baseball players in National League.
Hypotheses:
Null Hypothesis:
Numerical Null Hypothesis:
data analysis help   
Verbal Null Hypothesis:

elementary statistics help There is no significant difference between wins and salary- mil of the baseball players in National League.
 
Alternative Hypothesis:
Numerical Alternative Hypothesis:
help with statistics   
Verbal Alternative Hypothesis:
math statistics help There is a significant difference between wins and salary- mil of the baseball players in National League.
Level of Significance:
 α = 0.05
Decision rule:
If the p value is greater than the given level of significance we may accept the null hypothesis. Otherwise reject the null hypothesis.
Test Statistic:
                        online statistics help 

Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàHypothesis tests à Compare two independent groups

Hypothesis Test: Independent Groups (t-test, pooled variance)

Salary -mil

Wins

70.949

80.375

mean

20.669

8.831

std. dev.

16

16

n

30 

df

-9.4257

difference (Salary -mil - Wins)

252.5883

pooled variance

15.8930

pooled std. dev.

5.6190

standard error of difference

0

hypothesized difference

-1.68

 t

.1038

 p-value (two-tailed)


The test statistic value is -1.68.
The p value for the test statistic is 0.1038.

Conclusion:
Since the p value of test statistic is greater than 0.05 level of significance we may accept the null hypothesis H0 at 5% level of significance. Hence, we conclude that there is no significant difference between wins and salary- mil of the baseball players in National League.

Regression analysis:
The general multiple regression is given by         
probability and statistics help
where, y is the dependent variable,
          statistics assignment help’s are independent variable,
           statistics help is the actual constant,
          statistics help for students is the actual coefficient associated with ith independent variable,
          statistics help online is the error term which models the unsystematic error of the y    
The above model can be written in matrix form as    
                              statistics math help
The General Goal of multiple regression is to determine which independent (explanatory) variables should be included in the model.
We want to first test each coefficient,  statistics homework help where i=1,2,...,k, within the model, in order to determine if that individual parameter should be dropped from the model.
Next we test the goodness of fit of the model.

Hypothesis Tests:
     statistics probability help 
    Statistics help
Procedure:
 First we estimate the model as
     statistics homework help
 where, college statistics help is the estimated value of help with statistics and online statistics help.

For Testing Each probability and statistics help:
The test statistic is given by 
statistics assignment help
where, statistics help is the standard error of the estimated coefficient business statistics help.

Goodness of fit test:           
In order to test the goodness of fit test we generally compute R2, which lies between 0 and 1. As R2 tends to 1, we can say that the model is suitable for the data i.e. the model can explain the data very well. 

Dependent variable:
X7- Wins
Independent variables:
X2- League
X3- Built
X4- Size
X5- Surface
X6- Salary- mil
X8- Attendance
X9- Batting
X10- ERA
X11- HR
X12- Error
X13- SB

Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàCorrelation/ Regression à Regression analysis


Regression Analysis

0.857

 

 

Adjusted R²

0.770

30

0.926

11

Std. Error 

5.200

Dep. Var.

Wins

ANOVA table

Source

SS 

df 

MS

F

p-value

Regression

 2,917.2794

11  

265.2072

9.81

1.64E-05

Residual

 486.7206

18  

27.0400

 

 

Total

 3,404.0000

29  

 

 

 

Regression output

confidence interval

variables

 coefficients

std. error

   t (df=18)

p-value

95% lower

95% upper

Intercept

74.6634

133.9145

 0.558

.5840

-206.6805

356.0073

League

-1.2494

2.3275

 -0.537

.5980

-6.1392

3.6404

Built

-0.0274

0.0558

 -0.491

.6291

-0.1447

0.0899

Size

-0.00000401

0.00020556

 -0.019

.9847

-0.00043588

0.00042787

Surface

0.5761

4.3135

 0.134

.8952

-8.4863

9.6384

Salary -mil

0.0411

0.0667

 0.615

.5462

-0.0992

0.1813

Attendance

-0.00000085

0.00000317

 -0.267

.7923

-0.00000750

0.00000581

Batting

447.7443

200.5131

 2.233

.0385

26.4819

869.0067

ERA

-13.6362

2.4171

 -5.642

2.37E-05

-18.7143

-8.5581

HR

0.0930

0.0338

 2.755

.0130

0.0221

0.1639

Error

-0.1601

0.1246

 -1.285

.2151

-0.4218

0.1017

SB

0.0152

0.0361

 0.422

.6777

-0.0605

0.0910


The regression equation is
Wins = 74.6634 - 1.2494 League - 0.0274 Built - 0.00000401 Size + 0.5761 Surface + 0.0411 Salary -mil - 0.00000085 Attendance + 447.7443 Batting -13.6362 ERA + 0.0930 HR -0.1601 Error + 0.0152 SB


The R-Sq(adj.) value is high. So the model has good fit. But the p-values for x2, x3, x4, x5, x6, x12 and x13 are greater than 0.05. So these coefficients are insignificant. There is thus a multicollinearity problem. So we drop these variables and regress x7 on x9, x10 and x11.

Regression Analysis: x7 versus x9, x10, x11

Dependent variable:
X7- Wins
Independent variables:
X9- Batting
X10- ERA
X11- HR
Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàCorrelation/ Regression à Regression analysis

Regression Analysis

0.810

 

 

Adjusted R²

0.788

30

0.900

3

Std. Error 

4.988

Dep. Var.

Wins

ANOVA table

Source

SS 

df 

MS

F

p-value

Regression

 2,757.1594

3  

919.0531

36.94

1.60E-09

Residual

 646.8406

26  

24.8785

 

 

Total

 3,404.0000

29  

 

 

 

Regression output

confidence interval

variables

 coefficients

std. error

   t (df=26)

p-value

95% lower

95% upper

Intercept

1.8499

35.0214

 0.053

.9583

-70.1376

73.8374

Batting

492.4490

140.3025

 3.510

.0017

204.0532

780.8449

ERA

-15.9575

1.6753

 -9.525

5.78E-10

-19.4011

-12.5139

HR

0.1035

0.0289

 3.582

.0014

0.0441

0.1628


The regression equation is
Wins = 1.8499 + 492.4490 Batting -15.9575 ERA + 0.1035 HR


Here all the p values of the coefficients are less than 0.05 i.e. statistics help for students are significant at 5 % level of significance. The R2 value is slightly reduced after dropping the variables and it is of not that much effect and hence the model is good.

Correlation:
Research question: To find whether salary have relationship with Attendance of the baseball players.
There are two leagues denoted as
1 if American League and
0 if National League
We have separated the data set as
Data set
American League:


Salary -mil

Attendance

123.5

2,847,798

208.3

4,090,440

55.4

2,108,818

73.9

2,623,904

97.7

3,404,636

41.5

2,014,220

75.2

2,342,804

45.7

2,014,995

56.2

2,034,243

29.7

1,141,915

55.8

2,525,259

69.1

2,024,505

87.8

2,724,859

36.9

1,371,181


Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàCorrelation/ Regression à Correlation Matrix


Correlation Matrix

Salary -mil

Attendance

Salary -mil

 1.000

 

Attendance

 .895

 1.000

14

sample size


The correlation coefficient between salary- mil and attendance is 0.895. there is a strong positive correlation exist between the variables.

Null Hypothesis:
H0: ρ=0
H0: “no linear relationship” between the variables.
Alternative Hypothesis:
H1: ρ≠0
H1:“ linear relationship” between the variables.

Level of significance:
α = 0.05
Critical value:
At 5% level of significance t distribution with v = 14 - 2 degrees of freedom is  2.178813
Test statistic:
Under college statistics help
  statistics help    has a t distribution with v = n-2 degrees of freedom.
  r= 0.895 and n = 14
  data analysis help 
online statistics help
probability and statistics help
statistics assignment help
statistics help

Conclusion:
Since the test statistic value is greater than the critical value there is no evidence to accept the null hypothesis at 5% level of significance. Hence we conclude that there is a relationship exist between the variables salary- mil and attendance.

Data set
National League:


Salary -mil

Attendance

86.5

2,520,904

62.3

2,059,327

76.8

2,805,060

61.9

1,923,254

101.3

2,827,549

38.1

1,817,245

83

3,603,680

63.3

2,869,787

48.6

2,730,352

90.2

3,181,020

92.1

3,542,271

60.4

1,852,608

95.5

2,665,304

39.9

2,211,323

87

3,100,092

48.2

1,914,385


Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàCorrelation/ Regression à Correlation Matrix

Correlation Matrix

Salary -mil

Attendance

Salary -mil

 1.000

 

Attendance

 .693

 1.000

16

sample size


The correlation coefficient between salary- mil and attendance is 0.693. There is a strong positive correlation exist between the variables.
Null Hypothesis:
H0: ρ=0
H0: “no linear relationship” between the variables.
Alternative Hypothesis:
H1: ρ≠0
H1:“ linear relationship” between the variables.
Level of significance:
α = 0.05
Critical value:
At 5% level of significance t distribution with v = 16 - 2 degrees of freedom is  2.144787
Test statistic:
Under statistics homework help
   help with statistics   has a t distribution with v = n-2 degrees of freedom.
   r= 0.693 and n = 16
   statistics assignment help 
statistics
statistics help online
elementary statistics help
online statistics help

Conclusion:
Since the test statistic value is greater than the critical value there is no evidence to accept the null hypothesis at 5% level of significance. Hence we conclude that there is a relationship exist between the variables salary- mil and attendance.

Descriptive Statistics:
Using Megastat in Microsoft Excel Add- Ins:
Add- Ins à MegastatàDescriptive Statistics

 

Salary -mil

Wins

Attendance

Batting

ERA

HR

Error

SB

count

30

30

30

30

30

30

30

30

mean

73.064

81.000

2,496,457.93

0.26443

4.2847

167.23

102.00

85.50

sample variance

1,171.965

117.379

452,766,738,769.44

0.00005

0.3206

1,225.29

130.34

1,075.43

sample standard deviation

34.234

10.834

672,879.44

0.00728

0.5662

35.00

11.42

32.79

minimum

29.679067

56

1141915

0.252

3.49

117

86

31

maximum

208.30682

100

4090440

0.281

5.49

260

125

161

range

178.62775

44

2948525

0.029

2

143

39

130

 

 

 

 

 

 

 

 

 

1st quartile

50.293

73.250

2,017,372.50

0.25900

3.7875

136.75

92.50

65.25

median

66.191

81.000

2,523,081.50

0.26400

4.2000

164.00

102.50

76.00

3rd quartile

87.574

88.750

2,842,735.75

0.27000

4.5500

190.50

108.75

101.25

interquartile range

37.281

15.500

825,363.25

0.01100

0.7625

53.75

16.25

36.00

mode

#N/A

95.000

#N/A

0.27000

3.6100

130.00

106.00

45.00


The descriptive statistics for the whole team is given in the above table.

Inference for our research:

  • From the analysis of comparing two independent groups we obtain the result as there is no significant difference between wins and salary- mil of the baseball players in American League.
  • From the regression analysis we obtained the regression equation predicting the wins is
Wins = 1.8499 + 492.4490 Batting -15.9575 ERA + 0.1035 HR
  • From the correlation analysis we obtained the result as there is a relationship exists between the variables salary- mil and attendance of baseball players in American League.
  • From the correlation analysis we obtained the result as there is a relationship exists between the variables salary- mil and attendance of the baseball players in National League.

Hypothesis Help | Dissertation Statistics | Writing Dissertation | Dissertation Proposal | Sample Homework | Online Tutors | Online Tutoring Online Tutoring | Essay Writing Help