mini Lab 3 - Xeon Phi and OpenMP

Due on 11/01/2016

Overview

In this lab, you will modify sequential Graph500 benchmark using OpenMP that runs on Xeon Phi. The goal in this lab is to try different optimizations and beat execution time of sequential version. There are OpenMP implementations (not optimized for KNL nodes) of Graph500 easily available, but as always, if you do it yourself, you learn more.

You can find many useful tutorials on how to use OpenMP (OpenMP, LLNL, and TAMU). Read the Graph500.org file in the folder provided to you, to know more about the benchmark and data set. Look at the log files generated to see the details of the timing of your kernels. Note that the “Overall_Time” takes into account both the Construction time and BFS time.

Instructions

Download starter package from canvas and copy into stampede machine and untar the starter_lab3.tgz. You will be running sequential version on Xeon as well as Xeon-Phi, and the parallel version on Xeon-Phi.
Before starting the lab, please read about Stampede KNL Cluster (KNL, KNL Training) carefully. A few important points you have to know are:
- The KNL cluster has a single dedicated login node, so to access KNL login node by executing ssh username@login-knl1.stampede.tacc.utexas.edu
In starter package, we provide you build and execution scripts:
- To build, go to starter directory and run ./build.sh
- To execute sequential code, ./checker.pl seq
- To execute the openMP version, ./checker.pl omp
You will have to rerun build.sh while moving from Xeon to Xeon-Phi nodes.
Remember that the login nodes are not Xeon/Xeon-Phi nodes.
- For running your code on Xeon, login to a regular login node and submit your job to normal/gpu/gpudev etc. queues.
- For running your code on Xeon-Phi, login to a knl login node and submit your job to development/normal etc. queues (refer to the TACC page for details on KNL queues and configs)
You will mainly modify make_bfs() and create_graph_from_edgelist() functions in omp_csr/omp_csr.c (in the starter package, this file is a copy of the sequential version), but you are welcome to modify different parts of the source codes if necessary.

Submission Guide

Write a report and the following aspects should be included.
- Include both partners names and UT eids at the top of your write-up.
- Replicate the score table generated for your solution.
  - Additionally, report best Overall_Times for four inputs (“10″, “16″, “20″, “26″) for Xeon-seq, Xeon-Phi-seq and Xeon-Phi-parallel
- Briefly describe how you arrived at your final solution. What other approaches did you try along the way. What was wrong with them?
- How much memory/computation bandwidth is consumed? How does it compare to theoretical max bandwidth of Xeon Phi? Memory bound or Compute Bound?
Compress your starter directory. The file name should be lab3.tgz
Submit the compressed file on Canvas and report on Gradescope