CSE 310 Data Structures and Algorithms

Project#1

This project has two major goals:

The first is to implement various sorting and selection algorithms both in serial and in parallel.
The second is to run experiments comparing the performance of certain algorithms, to collect and plotdata, and to interpret the results.

All dynamic memory allocation must be done yourself, i.e., using either malloc() and free(), or new() and delete(). You may not use any external libraries to implement any part of this project.

See §2 for full project requirements. A script will be used to check the correctness of your program. Therefore, absolutely no changes to these project requirements are permitted.

You should use a version control system as you develop your solution to this project, e.g., Dropbox or GitHub. Your code repository should be private to prevent anyone from plagiarizing your work.

1 Super Bowl 50

The 50^th Super Bowl will be played in 2016. A mountain of statistics is gathered for each game, player category, player position, and team. In this project you will write a serial program that executes a sequence of commands that operate on game statistics over a number of years.

All input data is to be read from standard input (stdin). Of course, you may (and should) redirect stdin from a file. No file operations are allowed.

You may assume the format of the input data is correct.

The format of the input data is as follows:

The number of years y (integer) of team game statistics in this data set.
For each of y years, the following is provided:
- The year for which statistics are provided (integer).
- For each of 32 teams, numerous statistics are provided in a tab separated list. See the definition of struct team stats in §1.1.

Upon reading the value of y, you must dynamically allocate an array of struct annual stats of size y, and then initialize the array with the data following. Once the data structure is initialized, a sequence of commands is to be processed. The input continues with:

The number of commands c (integer).
c commands following the format in §1.2.

Each of the c commands must be processed as described in §1.2.

1.1 Format of Team Data

For each team, the structure team stats contains numerous fields associated with the given team for the given year. The structure annual stats holds team statistics for all teams in a given year. A file named defns.cc that contains these structure definitions is provided for you.

The data for this project is drawn from the team statistics page of the National Football League (NFL).

See http://www.nfl.com/stats/team.

{`
#define NO_TEAMS 32 // Number of NFL teams
#define TEAM_NAME_LEN 25 // Maximum team name string length #define TOP_LEN 6 // Maximum time of possession string length
struct team_stats{ char team_name[ TEAM_NAME_LEN ]; // Name of NFL team int games; // Number of games played in the season float pts_per_game; // Points per game int total_points; // Total points int scrimmage_plays; // Scrimmage plays float yds_per_game; // Yards per game float yds_per_play; // Yards per play float first_per_game; // First downs per game int third_md; // Third down conversions int third_att; // Third down attempts int third_pct; // Third down percentage int fourth_md; // Fourth down conversions int fourth_att; // Fourth down attempts int fourth_pct; // Fourth down percentage int penalties; // Number of penalties int pen_yds; // Penalty yards char top_per_game[ TOP_LEN ]; // Time of possession per game int fum; // Number of fumbles int lost; // Fumbles lost int to; // Turnover ratio
};
struct annual_stats{ int year;
struct team_stats teams[ NO_TEAMS ];
};
`}

1.2 Commands

There are eight (8) commands to implement. The syntax of valid commands is:

bsort year field order, use the Bubblesort algorithm to sort team data for the given year on the given field in the given order. The output is a list of team name, and field name, with appropriate headers on the list.
- If the year given is not one of the years this is part of the input data, the error message: Error: no such year. should be output.
- The field may be any field of struct team stats. This structure contains fields of character strings, integers, and floating point numbers. Hence the Bubblesort algorithm should be able to sort any of these types.
- The order may be either incr for increasing order, and decr for decreasing order. If there are multiple fields with the same order, then they should be sorted in alphabetic order by team name.
qsort year field order, use the Quicksort algorithm to sort team data for the given year on the given field in the given order. Use the leftmost element as the pivot. See the command bsort for a description of the year, field, and order, as well as the format of the output.
bfind year field item, first uses Bubblesort to sort the team data for the given year on the given field in increasing order. Then, a selection is made based on item. Valid items for selection include:
- max, prints the maximum value for the given field in the given year, along with the team name(s) achieving the maximum.
- min, prints the minimum value for the given field in the given year, along with the team name(s) achieving the minimum.
- average, prints the average value for the given field in the given year over all teams. Only an integer or floating point field will be given for this item.
- median, prints the median value for the given field in the given year over all teams. Here, “median” refers to the lower median, i.e., the element at position b(n+1)/2c, where n is the total number of elements.
qfind year field item, first uses Quicksort to sort the team data for the given year on the given field in increasing order, using the leftmost element as the pivot. See the command bfind for a description of the year, field, and item, as well as the format of the output.
bsort range start-year end-year field order, use the Bubblesort algorithm to sort team data for the range of years starting with start-year and ending with end-year, on the given field in the given order. There are two differences between this command and bsort for a single year.
- The sort is to be performed on data for all years in the given range. You may assume that start-year < end-year and that all years exist in the input data provided.
- Add the year as part of the output for clarity.
qsort range start-year end-year field order, use the Quicksort algorithm to sort team data for the range of years starting with start-year and ending with end-year, on the given field in the given order. See the command bsort on a range of years for a description of the start-year, end-year, field, and order, as well as the format of the output.
pmax year field, uses a parallel divide-and-conquer algorithm implemented using two threads in OpenMP to determine the maximum value for the given field in the given year, along with the team name(s) achieving the maximum. Hint: The presentation in the file Parallel-Intro.pptx will be reviewed in class; a copy of this presentation is also provided to you. It illustrates a parallel divideand-conquer approach to sum values stored in an array. A similar idea may be used to compute the maximum element.
pmin year field, uses a parallel divide-and-conquer algorithm implemented using two threads in OpenMP to determine the minimum value for the given field in the given year, along with the team name(s) achieving the minimum. (See also the Hint for the pmax )

Consider the following valid example command sequence. Comments are not part of the input, and are only included here for clarification.

8 // A total of 8 commands follow bsort 2014 penalties decr // Bubblesort 2014 data on penalties in decreasing order qsort 2015 pts_per_game decr // Quicksort 2015 data on points per game in decreasing order bfind 2015 yds_per_game max // Return maximum yards per game after Bubblesorting 2015 data bfind 2014 fum average // Return average of number of fumbles after Bubblesorting 2014 data qfind 2015 top_per_game min // Return minimum time of possession per game after Quicksorting 2015 data bsort range 2014 2015 penalties incr // Bubblesort 2014-2015 data on penalties in increasing order pmax 2015 yds_per_play // Use parallel algorithm to return maximum yards per play in 2015 pmin 2014 fourth_pct // Use parallel algorithm to return minimum fourth down percentage in 2014

2 Program Requirements for Project #1

Write a C/C++ program that implements all of the commands described in §1.2 on data in the format described in §1. You must use OpenMP to implement the commands pmax and pmin in parallel using a divide-and-conquer approach with two threads; all other commands are to be implemented in serial.
Design experiments that exercise your program to answer the questions in §3. A brief report with figures and data to support your answers is expected.
Provide a Makefile that compiles your program into an executable named p1. This executable must be able to run commands read from standard input directly, or from a script file redirected from standard input (this should require no change to your program).

Sample input files that adhere to the format described in §1 will be provided on Blackboard; use them to test the correctness of your programs.

3 Experimentation

Plot the run time of your Bubblesort algorithm and your Quicksort algorithm on arrays of integers ofsize n = {`10¹,10²,10³,10⁴,10⁵,10⁶`} (or larger if you wish). For this you will need to write a separate program that simply calls the Bubblesort and Quicksort algorithms you have implemented to meet the program requirements of Project #1, i.e., use your solution to the bsort and qsort The clock function is one function you may use to obtain the processor time used by your algorithm. A program to generate random integer data will be provided.
- Do you observe a cross-over point? That is, can you recommend when you should use one algorithm over the other?
Plot the run time of maximum finding on arrays of integers of size n = {`10¹,10²,10³,10⁴,10⁵,10⁶`} (or larger if you wish) using three techniques: Bubblesort the array in increasing order and then select the maximum, Quicksort the array in increasing order and then select the maximum, and finally use your OpenMP parallel divide-and-conquer maximum finding algorithm. Specifically, you should be able to use your solutions to bfind and qfind on item=max, and pmax, respectively, for this purpose. Use random integer data for this purpose.
- Do you observe any cross-over points? That is, is there a size at which the parallel algorithm overtakes either serial algorithm? Is there a point the recursive Quicksort algorithm overtakes the iterative Bubblesort?
- Can you conclude anything about when to use the iterative, recursive divide-and-conquer, or parallel divide-and-conquer algorithms?