Before each class, submit answers to these questions on Gradescope. I expect a few bullet points for each answer as preparation for discussion. I don’t want to see long paragraphs!

LectureDateTopic (notes)ReadingComments
18/30Introduction and Policies  
29/4GPP binary compatibilityJ. Denhert et al., “The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges,” CGO’03 
39/6DNN inferenceS. Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network (EZProxy), ISCA’16 
49/11Reliability and PowerD. Ernst et al., “Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation”, MICRO-36, 2003

A related white paper (I suggest you read it), and a Short Explanation on Metastability
 
59/13NVM TechnologyC. Xu et al., “Overcoming the challenges of crossbar resistive memory architectures (EZProxy), HPCA, 2015 
69/20SlackCatching up on discussions 
79/18Memory management (OS)Y. Kwon et al., “Coordinated and Efficient Huge Page Management with Ingens”, OSDI 2016 
8 + 99/25Capabilities and securityJ. Woodruff et al., “The CHERI Capability Model: Revisiting RISC in an Age of Risk”, ISCA 2014 (EZProxy)Background
1010/2Side channelsC. Hunger et al., “Understanding Contention-Based Channels and Using Them for Defense”, HPCA 2015 (EZProxy) 
1110/4Fine-grained threadsD. Culler et al., “Fine-Grain Parralelism with Minimal Hardware Support: A Compiler-Controlled Threaded Abstract Machine,” ASPLOS 1991 (EZProxy) 
1210/9HW Active Messages]]M. Noakes et al., “The J-Machine Multicomputer: An Architecural Evaluation”, ISCA 20, 1993.

PLEASE ALSO READ THIS OVERVIEW WITH PICTURES: W. J. Dally et al., “The J-Machine: A Retrospective”, 1998 (EZProxy).
 
1310/11SlackDiscussion continues, but read: K. Fatahalian and M. Houston, “GPUs: a Closer Look”, ACM Queue 6(2), 2008 (EZProxy) 
1410/16GPUs IW. Fung and T. Aamodt, “Thread Block Compaction for Efficient SIMT Control Flow”, HPCA, 2011 (EZProxy)

‘’‘ALSO HIGHLY RECOMMENDED: V. Narasiman et al., “Improving GPU performance via large warps and two-level warp scheduling”, MICRO 2011 (EZProxy)
 
1510/18GPUs IID. Merril et al., “Scalable GPU Graph Traversal”, PPoPP 2012 (EZProxy) 
1610/23Exam 1  
1710/25ProjectsProject breakouts 
1810/30GPU→DataflowD. Voitsechov and Y. Etsion, “Single-graph multiple flows: energy efficient design alternative for GPGPUs”, ISCA 2014 (EZProxy) 
1911/1Continued  
2011/6Datacenters 1H. Zhu and M. Erez, “Dirigent: Enforcing QoS for latency-critical tasks on shared multicore systems”, ASPLOS 2016 (EZProxy) 
2111/8QoS (2)Y. Zhou and D. Wentzlaff, “MITTS: memory inter-arrival time traffic shaping”, ISCA 2016 (EZProxy) 
2211/13CompressionE. Choukse et al., “Compresso: Pragmatic Main Memory Compression”, MICRO 2018 
2311/14HPCA. Magni et al., “A large-scale cross-architecture evaluation of thread-coarsening”, SC13, 2013 (EZProxy) 
2411/20Hardware DSLD. Koeplinger etl al., “Spatial: a language and compiler for application accelerators”, PLDI 2018 (EZProxy) 
2511/27Spectre/MeltdownC. Canella et al., “A Systematic Evaluation of Transient Execution Attacks and Defenses”, arXiv Preprint 2018. 
2611/29Exam 2Take home exam 
2712/4Transactional MemoryL. Hammond et al., “Transactional Memory Coherence and Consistency”, ISCA 2004 (EZProxy)