CS 432 Parallel Computing HW 6

CS 432/632/732 Parallel Computing

Objectives:

To design and implement a hybrid version of message-passing and multithreaded version of the "Game of Life" program using Message Passing Interface (MPI) and OpenMP (or Pthreads).

Problem Statement:

The objectives of this homework are:

  1. Design and implement a hybrid version of message-passing and multithreaded version of the Game of Life program using MPI for message-passing and OpenMP (or Pthreads) for multithreading. Use non-blocking point-to-point communications to exchange data between the different processes for the nearest neighbor communications. You could use collective communication operations to distribute the data initially and collect the data at the end (alternatively, you could also design your program to work completely in parallel without requiring the master process to distribute and collect data). The number of threads per process must be passed as a command-line argument to the program in addition to the matrix size and number of iterations.
  2. Test the program for functionality and correctness (the output from the hybrid version must be identical to the sequential version).
  3. Measure the performance of the program and optimize the program to improve performance. Use MPI_Wtime to measure the time taken.
  4. Determine speed and efficiency of the hybrid version of the program.
  5. Analyze the performance of hybrid version with that of the corresponding messagepassing (MPI) and multithreaded versions (Pthreads/OpenMP) and include the findings in the report.
  6. Graduate Students Only: Read the article by Jonathan Dursi entitled “HPC is dying, and MPI is killing it” available at https://www.dursi.ca/post/hpc-is-dying-andmpi-is-killing-it.html and answer the following questions and include them in your report:
    1. What is the central theme of this article?
    2. Is HPC dying? Do you have any evidence to support this claim? If so, how is MPI contributing to the demise of HPC?
    3. If you agree with the author’s assertions, explain what evidence the author provides to make your case. Do you have additional arguments to support author’s claim?
    4. If you do not agree with author’s assertions, explain where the author fails to make a convincing case to support his argument. Do you have additional arguments to support your case?

Guidelines and Hints:

  1. Modify the message-passing version of the program (developed in Homework-5) to include multithreading to distribute the computation in each process among Q threads. Note that the message-passing version distributed the array among P processes by distributing (N/P)*N elements to each process. In this homework, divide each of the subarray within a process among Q threads such that each thread computes (N/P)*(N/Q) elements. This is equivalent to a two-dimensional data distribution with the data distributed across the rows among processes and data distributed within a set of rows among multiple threads as shown in Figure 1.

Figure 1. Illustration of data distribution among P processes and Q threads.

  1. Review Chapter 3. Distributed Memory Programming with MPI in the textbook, download the source code from the textbook website, compile and test the programs on a Linux system (you can use Vulcan machines in CIS for testing). You can also review the MPI tutorial at https://computing.llnl.gov/tutorials/mpi/.
  2. For testing purposes use the same seed for the random number generator for both the sequential and parallel version of the programs and write the output matrix to a file and compare the output files.
  3. Use Vulcan machines in CIS for all development and testing only.
  4. Make sure you submit jobs to the compute nodes and DO NOT run any jobs on the login node/head node on the ASA cluster. Also make sure you submit your jobs ONLY to the “class” queue.
  5. Make sure you comment out any print statements you might have to print the board when you execute with larger problem sizes. Also execute the program three times and use the average time taken.
  6. For details about the MPI functions use the MPI standard or MPI man pages.
  7. Execute your program on the cluster dmc.asc.edu for the following combination of processes and threads and complete the table below with the time taken. Use the matrix size 5000x5000 and maximum iterations as 5000 for all the test cases. Compute the speedup and efficiency and plot them separately.

Case #

# Processes

# Threads

Time Taken

1

1

16

2

2

8

3

4

4

4

8

2

5

16

1

Note that Case #1 would be equivalent to the multithreaded version and Case #5 would be equivalent to the message-passing version implemented in previous homework.

  1. Check-in the final version of your program to the CIS git server and make sure to share your git repository with the TAs and the Instructor.

Program Documentation and Testing:

  1. Use appropriate class name and variables names.
  2. Include meaningful comments to indicate various operations performed by the program.
  3. Programs must include the following header information within comments:

/*

Name:

BlazerId:

Homework #:

*/

Report:

Follow the guidelines provided in Canvas to write the report. Submit the report as a Word or PDF file. Please include the URL to your git location in the report and make sure that you have shared your git repository with the TAs and the Instructor. If you are using specific compiler flags, please make sure to include that in your report as well as README file and check-in the README file to the git repository.

Submission:

Upload the source files and report (.doc or .pdf file) to Canvas in the assignment submission section for this homework. You can create a zip file with all the source files and lab report and upload the zip file to Canvas. There is NO need to turn in any printed copies in class.

Grading Rubrics:

The following grading policy will be used for grading programming assignments:

Program Design and Implementation

50% (program with no compiler errors or logical errors with the required functionality)

Program Testing and Performance Analysis

30% (includes selecting appropriate test cases, performing the tests, tabulating/plotting/graphing the test results, and analyzing performance)

Report

10% (documentation of problem requirements, program design, implementation, instructions for compiling, performance analysis, and test cases)

Source Code Formatting

10% (indentation, variable names, comments, etc.)

Rubrics for grading the report is as follows:

Correct use of English grammar and spelling comprises a baseline requirement for writing.

20%

Clear exposition of the ideas central to the report (e.g., performance evaluation and analysis) is accomplished.

20%

Organizational structure at the high-level, mid-level, paragraph level, and sentence level is reviewed for logic, clarity, uniform continuity, and flow.

20%

The content fulfills the requirements for the technical writing exercise.

10%

Word and language usage is consistent with a scientific report (formality of word choice, person, attention to audience).

10%

Appropriate credit to others (references style and content) is required.

10%

Formatting of the report is appropriate to enable the communication to be effective and professional.

10%

CS 432 Parallel Computing HW 4