Statistics help

One and Two (or more) Sample Hypothesis Testing Paper. Using data from one of the data sets available through the “Data Sets” link on your   page, develop one business research question from which you will formulate a research hypothesis to test one population parameter and another to test two (or more) population parameters.  Formulate both a numerical and verbal hypothesis statement regarding each of your research issue.

Perform hypotheses tests using the five step model. Describe and interpret the results of the test, both in statistical terms and in conversational English. Include appropriate descriptive statistics.

Solution:

Research question: To find whether there is a significant difference between wins and salary of the baseball players.

There are two leagues denoted as

1 if American League and

0 if National League

We have separated the data set as

Data set

American League:

 

Salary

Salary -mil

Wins

123505125.0

123.5

95.0

208306817.0

208.3

95.0

55425762.0

55.4

88.0

73914333.0

73.9

74.0

97725322.0

97.7

95.0

41502500.0

41.5

93.0

75178000.0

75.2

99.0

45719500.0

45.7

80.0

56186000.0

56.2

83.0

29679067.0

29.7

67.0

55849000.0

55.8

79.0

69092000.0

69.1

71.0

87754334.0

87.8

69.0

36881000.0

36.9

56.0

 

 

Claim: There is a significant difference between wins and salary- mil of the baseball players in American League.

Hypotheses:

Null Hypothesis:

Numerical Null Hypothesis:

   

Verbal Null Hypothesis:

 There is no significant difference between wins and salary- mil of the baseball players in American League.

Alternative Hypothesis:

Numerical Alternative Hypothesis:

   

Verbal Alternative Hypothesis:

 There is a significant difference between wins and salary- mil of the baseball players in American League.

Level of Significance:

 α = 0.05

Decision rule:

If the p value is greater than the given level of significance we may accept the null hypothesis. Otherwise reject the null hypothesis.

Test Statistic:

                         

 

 

 

 

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕHypothesis tests ŕ Compare two independent groups

 

Hypothesis Test: Independent Groups (t-test, pooled variance)

Salary -mil

Wins

75.479

81.71

mean

45.930

13.07

std. dev.

14

14

n

26 

df

-6.2357

difference (Salary -mil - Wins)

1,140.1793

pooled variance

33.7665

pooled std. dev.

12.7626

standard error of difference

0

hypothesized difference

-0.49

 t

.6292

 p-value (two-tailed)

 

The test statistic value is -0.49.

The p value for the test statistic is 0.6292.

 

Conclusion:

Since the p value of test statistic is greater than 0.05 level of significance we may accept the null hypothesis H0 at 5% level of significance. Hence, we conclude that there is no significant difference between wins and salary- mil of the baseball players in American League.

 

 

 

Research question: To find whether there is a significant difference between wins and salary of the baseball players.

 

Data set

National League:

Salary

Salary -mil

Wins

86457302.0

86.5

90.0

62329166.0

62.3

77.0

76799000.0

76.8

89.0

61892583.0

61.9

73.0

101305821.0

101.3

83.0

38133000.0

38.1

67.0

83039000.0

83.0

71.0

63290833.0

63.3

82.0

48581500.0

48.6

81.0

90199500.0

90.2

75.0

92106833.0

92.1

100.0

60408834.0

60.4

83.0

95522000.0

95.5

88.0

39934833.0

39.9

81.0

87032933.0

87.0

79.0

48155000.0

48.2

67.0

 

Claim: There is a significant difference between wins and salary- mil of the baseball players in National League.

Hypotheses:

Null Hypothesis:

Numerical Null Hypothesis:

   

Verbal Null Hypothesis:

 There is no significant difference between wins and salary- mil of the baseball players in National League.

 

 

Alternative Hypothesis:

Numerical Alternative Hypothesis:

   

Verbal Alternative Hypothesis:

 There is a significant difference between wins and salary- mil of the baseball players in National League.

Level of Significance:

 α = 0.05

Decision rule:

If the p value is greater than the given level of significance we may accept the null hypothesis. Otherwise reject the null hypothesis.

Test Statistic:

                         

 

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕHypothesis tests ŕ Compare two independent groups

 

Hypothesis Test: Independent Groups (t-test, pooled variance)

Salary -mil

Wins

70.949

80.375

mean

20.669

8.831

std. dev.

16

16

n

30 

df

-9.4257

difference (Salary -mil - Wins)

252.5883

pooled variance

15.8930

pooled std. dev.

5.6190

standard error of difference

0

hypothesized difference

-1.68

 t

.1038

 p-value (two-tailed)

 

The test statistic value is -1.68.

The p value for the test statistic is 0.1038.

 

Conclusion:

Since the p value of test statistic is greater than 0.05 level of significance we may accept the null hypothesis H0 at 5% level of significance. Hence, we conclude that there is no significant difference between wins and salary- mil of the baseball players in National League.

 

Regression analysis:

The general multiple regression is given by         


where, y is the dependent variable,

          ’s are independent variable,

           is the actual constant,

           is the actual coefficient associated with ith independent variable,

          * is the error term which models the unsystematic error of the y    

The above model can be written in matrix form as    

                             

The General Goal of multiple regression is to determine which independent (explanatory) variables should be included in the model.

We want to first test each coefficient,  where i=1,2,...,k, within the model, in order to determine if that individual parameter should be dropped from the model.

Next we test the goodness of fit of the model.

 

 

Hypothesis Tests:

      

   

Procedure:

 First we estimate the model as

    

 where,  is the estimated value of  and .

 

 For Testing Each :

The test statistic is given by 

where,  is the standard error of the estimated coefficient .

Goodness of fit test:           

In order to test the goodness of fit test we generally compute R2, which lies between 0 and 1. As R2 tends to 1, we can say that the model is suitable for the data i.e. the model can explain the data very well. 

 

 

 

 

 

 

 

Dependent variable:

X7- Wins

Independent variables:

X2- League

X3- Built

X4- Size

X5- Surface

X6- Salary- mil

X8- Attendance

X9- Batting

X10- ERA

X11- HR

X12- Error

X13- SB

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕCorrelation/ Regression ŕ Regression analysis

Regression Analysis

0.857

 

 

Adjusted R˛

0.770

n 

30

R 

0.926

k 

11

Std. Error 

5.200

Dep. Var.

Wins

ANOVA table

Source

SS 

df 

MS

F

p-value

Regression

 2,917.2794

11  

265.2072

9.81

1.64E-05

Residual

 486.7206

18  

27.0400

 

 

Total

 3,404.0000

29  

 

 

 

Regression output

confidence interval

variables

 coefficients

std. error

   t (df=18)

p-value

95% lower

95% upper

Intercept

74.6634

133.9145

 0.558

.5840

-206.6805

356.0073

League

-1.2494

2.3275

 -0.537

.5980

-6.1392

3.6404

Built

-0.0274

0.0558

 -0.491

.6291

-0.1447

0.0899

Size

-0.00000401

0.00020556

 -0.019

.9847

-0.00043588

0.00042787

Surface

0.5761

4.3135

 0.134

.8952

-8.4863

9.6384

Salary -mil

0.0411

0.0667

 0.615

.5462

-0.0992

0.1813

Attendance

-0.00000085

0.00000317

 -0.267

.7923

-0.00000750

0.00000581

Batting

447.7443

200.5131

 2.233

.0385

26.4819

869.0067

ERA

-13.6362

2.4171

 -5.642

2.37E-05

-18.7143

-8.5581

HR

0.0930

0.0338

 2.755

.0130

0.0221

0.1639

Error

-0.1601

0.1246

 -1.285

.2151

-0.4218

0.1017

SB

0.0152

0.0361

 0.422

.6777

-0.0605

0.0910

 

The regression equation is

Wins = 74.6634 - 1.2494 League - 0.0274 Built - 0.00000401 Size + 0.5761 Surface + 0.0411 Salary -mil - 0.00000085 Attendance + 447.7443 Batting -13.6362 ERA + 0.0930 HR -0.1601 Error + 0.0152 SB

The R-Sq(adj.) value is high. So the model has good fit. But the p-values for x2, x3, x4, x5, x6, x12 and x13 are greater than 0.05. So these coefficients are insignificant. There is thus a multicollinearity problem. So we drop these variables and regress x7 on x9, x10 and x11.

 

Regression Analysis: x7 versus x9, x10, x11

 

Dependent variable:

X7- Wins

Independent variables:

X9- Batting

X10- ERA

X11- HR

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕCorrelation/ Regression ŕ Regression analysis

 

Regression Analysis

0.810

 

 

Adjusted R˛

0.788

n 

30

R 

0.900

k 

3

Std. Error 

4.988

Dep. Var.

Wins

ANOVA table

Source

SS 

df 

MS

F

p-value

Regression

 2,757.1594

3  

919.0531

36.94

1.60E-09

Residual

 646.8406

26  

24.8785

 

 

Total

 3,404.0000

29  

 

 

 

Regression output

confidence interval

variables

 coefficients

std. error

   t (df=26)

p-value

95% lower

95% upper

Intercept

1.8499

35.0214

 0.053

.9583

-70.1376

73.8374

Batting

492.4490

140.3025

 3.510

.0017

204.0532

780.8449

ERA

-15.9575

1.6753

 -9.525

5.78E-10

-19.4011

-12.5139

HR

0.1035

0.0289

 3.582

.0014

0.0441

0.1628

 

The regression equation is

Wins = 1.8499 + 492.4490 Batting -15.9575 ERA + 0.1035 HR

Here all the p values of the coefficients are less than 0.05 i.e.  are significant at 5 % level of significance. The R2 value is slightly reduced after dropping the variables and it is of not that much effect and hence the model is good.

 

 

Correlation:

Research question: To find whether salary have relationship with Attendance of the baseball players.

There are two leagues denoted as

1 if American League and

0 if National League

We have separated the data set as

Data set

American League:

Salary -mil

Attendance

123.5

2,847,798

208.3

4,090,440

55.4

2,108,818

73.9

2,623,904

97.7

3,404,636

41.5

2,014,220

75.2

2,342,804

45.7

2,014,995

56.2

2,034,243

29.7

1,141,915

55.8

2,525,259

69.1

2,024,505

87.8

2,724,859

36.9

1,371,181

 

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕCorrelation/ Regression ŕ Correlation Matrix

Correlation Matrix

Salary -mil

Attendance

Salary -mil

 1.000

 

Attendance

 .895

 1.000

14

sample size

The correlation coefficient between salary- mil and attendance is 0.895. there is a strong positive correlation exist between the variables.

 

Null Hypothesis:

H0: ρ=0

H0: “no linear relationship” between the variables.

Alternative Hypothesis:

H1: ρ≠0

H1:“ linear relationship” between the variables.

Level of significance:

α = 0.05

Critical value:

At 5% level of significance t distribution with v = 14 - 2 degrees of freedom is  2.178813

Test statistic:

Under

     has a t distribution with v = n-2 degrees of freedom.

  r= 0.895 and n = 14

   

 

Conclusion:

Since the test statistic value is greater than the critical value there is no evidence to accept the null hypothesis at 5% level of significance. Hence we conclude that there is a relationship exist between the variables salary- mil and attendance.

 

Data set

National League:

Salary -mil

Attendance

86.5

2,520,904

62.3

2,059,327

76.8

2,805,060

61.9

1,923,254

101.3

2,827,549

38.1

1,817,245

83

3,603,680

63.3

2,869,787

48.6

2,730,352

90.2

3,181,020

92.1

3,542,271

60.4

1,852,608

95.5

2,665,304

39.9

2,211,323

87

3,100,092

48.2

1,914,385

 

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕCorrelation/ Regression ŕ Correlation Matrix

 

Correlation Matrix

Salary -mil

Attendance

Salary -mil

 1.000

 

Attendance

 .693

 1.000

16

sample size

The correlation coefficient between salary- mil and attendance is 0.693. There is a strong positive correlation exist between the variables.

Null Hypothesis:

H0: ρ=0

H0: “no linear relationship” between the variables.

Alternative Hypothesis:

H1: ρ≠0

H1:“ linear relationship” between the variables.

Level of significance:

α = 0.05

Critical value:

At 5% level of significance t distribution with v = 16 - 2 degrees of freedom is  2.144787

Test statistic:

Under

      has a t distribution with v = n-2 degrees of freedom.

   r= 0.693 and n = 16

    

 

 

Conclusion:

Since the test statistic value is greater than the critical value there is no evidence to accept the null hypothesis at 5% level of significance. Hence we conclude that there is a relationship exist between the variables salary- mil and attendance.

Descriptive Statistics:

Using Megastat in Microsoft Excel Add- Ins:

Add- Ins ŕ MegastatŕDescriptive Statistics

 

Salary -mil

Wins

Attendance

Batting

ERA

HR

Error

SB

count

30

30

30

30

30

30

30

30

mean

73.064

81.000

2,496,457.93

0.26443

4.2847

167.23

102.00

85.50

sample variance

1,171.965

117.379

452,766,738,769.44

0.00005

0.3206

1,225.29

130.34

1,075.43

sample standard deviation

34.234

10.834

672,879.44

0.00728

0.5662

35.00

11.42

32.79

minimum

29.679067

56

1141915

0.252

3.49

117

86

31

maximum

208.30682

100

4090440

0.281

5.49

260

125

161

range

178.62775

44

2948525

0.029

2

143

39

130

 

 

 

 

 

 

 

 

 

1st quartile

50.293

73.250

2,017,372.50

0.25900

3.7875

136.75

92.50

65.25

median

66.191

81.000

2,523,081.50

0.26400

4.2000

164.00

102.50

76.00

3rd quartile

87.574

88.750

2,842,735.75

0.27000

4.5500

190.50

108.75

101.25

interquartile range

37.281

15.500

825,363.25

0.01100

0.7625

53.75

16.25

36.00

mode

#N/A

95.000

#N/A

0.27000

3.6100

130.00

106.00

45.00

 

The descriptive statistics for the whole team is given in the above table.

 

 

Inference for our research:

 

  • From the analysis of comparing two independent groups we obtain the result as there is no significant difference between wins and salary- mil of the baseball players in American League.
  • From the analysis of comparing two independent groups we obtain the result as there is no significant difference between wins and salary- mil of the baseball players in National League.
  • From the regression analysis we obtained the regression equation predicting the wins is

Wins = 1.8499 + 492.4490 Batting -15.9575 ERA + 0.1035 HR

·         From the correlation analysis we obtained the result as there is a relationship exists between the variables salary- mil and attendance of baseball players in American League.

  • From the correlation analysis we obtained the result as there is a relationship exists between the variables salary- mil and attendance of the baseball players in National League.

 

 

Math and Science
  • Geometry
  • Algebra
  • Trigonometry
  • Statistics and Probability
  • Calculus
  • Number Theory
  • Discrete Mathematics
  • Science
Computer Science
  • Languages:C/C++/C#,Java, VB, .Net, Prolog, Assembly, Matlab
  • Databases: SQL, Oracle, SQL Server, MySQL,
    MS Acces
  • Data structures and algorithm
  • Operating Systems
  • Computer Network
  • UML Diagrams
Engineering
  • Biochemical & Biotechnology
  • Chemical Engineering
  • Civil Engineering
  • Computer Sc & Engineering
  • Electrical & Electronics Eng.
  • Mathematics & Computing
  • Mechanical & Industrial Engg.
  • Textile Technology
Business studies
  • Finance
  • Marketing
  • Human Resourse Developm.
  • Operations
  • Strategy and planning
  • Project management/li>
  • Economics
  • Information Technology
Essays
  • Case Studies
  • Business development
  • Psychology
  • Arts & Architecture
  • Humanities
  • Social Science

The team at Assignmenthelp.net is well qualified to provide any kind of educational service requirement, be it - Anytime Homework Help, Project Related Help, Case Studies, Business Plan, Grade Tutoring, Email - Assignment Assistance, Content Development, Essay Writing and Assessment Services, Test Assessment Services to suit your requirements.More