Instructions:
Grading: The grand total is 60 marks. Each of the 11 parts is worth 5 marks and appendix/presentation of results is worth 5 marks. A general marking scheme for each part is given below:
Per Question Part
Presentation and Appendix
1. [30 marks] Consider the baseball dataset describing the population of baseball players in the data file baseball.csv. Set the seed of your randomization to be the last 4 digits of your student number.
The R package- ‘sampling’, which includes the functions- strata and getdata, is useful for this question. The following R codes show how to install and load the package.
install.package("sampling") #load sampling package, to use the functions- strata and getdata library(sampling)
2. [15 marks] Use the population data set hh18.csv with N = 251 pairs of measurements of height, x and handspan, y from our class to mainly compare regression and ratio estimation for estimating the mean handspan μy, using information from a sample of size n =10. Set the seed of your randomization to be the last 4 digits of your student number.
3. [10 marks] A market research firm constructed a sampling plan to estimate the weekly sales of brand A cereal in a certain geographic area. The firm decided to sample cities within the area and then to sample supermarkets within cities. The number of boxes of brand A cereal sold in a specified week is the measurement of interest. Five cities are sampled from the 20 in the area. Using the data given in the accompanying table, answer the following:
City |
Number of supermarkets |
Supermarkets sampled |
y ̄i |
s2i |
1 2 3 4 5 |
45 36 20 18 28 |
9 7 4 4 6 |
102 90 76 94 120 |
20 16 22 26 12 |
Assignment Writing Help
Engineering Assignment Services
Do My Assignment Help
Write My Essay Services