Prospective students, please read this if you are interested in joining my group.

Some of our software releases are linked through https://lph.ece.utexas.edu/public.

System Resiliency, Reliability, and Dependability

As the number of devices per computational node grows larger and as computer systems rely more and more on multiple nodes for high performance, reliability aspects, especially soft-error tolerance, will become critical for single processors and consumer computer systems. In addition, energy/power efficiency is the number one concern for modern designs. Our approach is to research techniques that can provide resiliency where it is most efficient rather than where it is simple to do so. We rely on co-operative mechanisms that utilize multiple system layers, including hardware and software in many cases to achieve efficiency and resiliency. We are also working on resilience abstractions and mechanisms implementations with Containment Domains.

  • Jungrae Kim, Michael Sullivan, Seong-Lyong Gong, and Mattan Erez. Frugal ECC: Efficient and Versatile Memory Error Protection through Fine-Grained Compression. In the Proceedings of SC15. Austin, TX, November, 2015, pages 12:1–12. (PDF) (BibTeX)
  • Jungrae Kim, Michael Sullivan, and Mattan Erez. Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory. In the Proceedings of HPCA. Burlingame, CA, February, 2015, pages 101–112. (PDF) (BibTeX)
  • Dong Wan Kim, and Mattan Erez. Balancing Reliability, Cost, and Performance Tradeoffs with FreeFault. In the Proceedings of HPCA. Burlingame, CA, February, 2015, pages 439–450. (PDF) (BibTeX)
  • Marc Snir, Robert W Wisniewski, Jacob A Abraham, Sarita V Adve, Saurabh Bagchi, Pavan Balaji, Jim Belak, Pradip Bose, Franck Cappello, Bill Carlson, Andrew A Chien, Paul Coteus, Nathan A DeBardeleben, Pedro C Diniz, Christian Engelmann, Mattan Erez, Saverio Fazzari, Al Geist, Rinku Gupta, Fred Johnson, Sriram Krishnamoorthy, Sven Leyffer, Dean Liberty, Subhasish Mitra, Todd Munson, Rob Schreiber, Jon Stearley, and Eric Van Hensbergen. Addressing Failures in Exascale Computing. International Journal of High Performance Computing Applications, 28(2):129–173, May, 2014. (BibTeX)
  • Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon, Larry Kaplan, and Mattan Erez. Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems. In the Proceedings of SC12. Salt Lake City, UT, November, 2012, pages 58:1–11. (PDF) (SLIDES) (BibTeX)
  • Evgeni Krimer, Patrick Chiang, and Mattan Erez. Lane Decoupling for Improving the Timing-Error Resiliency of Wide-SIMD Architectures. In the proceedings of ISCA. Portland, OR, June, 2012, pages 237–248. (PDF) (SLIDES) (BibTeX)
  • Robert Pawlowski, Evgeni Krimer, Joseph Crop, Jacob Postman, Nariman Moezzi-Madani, Mattan Erez, and Patrick Chiang. A 530mV 10-Lane SIMD Processor With Variation Resiliency in 45nm SOI. In the proceedings of ISSCC. San Francisco, CA, February, 2012, pages 492–494. (BibTeX)
  • Doe Hyun Yoon, Naveen Muralimanohar, Jichuan Chang, Parthasarathy Ranganathan, Norman P. Jouppi, and Mattan Erez. FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors. In the proceedings of HPCA. San Antonio, TX, February, 2011, pages 466–477. (PDF) (SLIDES) (BibTeX)
  • Michael Sullivan, Doe Hyun Yoon, and Mattan Erez. Containment Domains: A Full-System Approach to Computational Resiliency. Technical report TR-LPH-2011–001, LPH Group, Department of Electrical and Computer Engineering, The University of Texas at Austin, January, 2011. (PDF) (BibTeX)
  • Mehmet Basoglu, Michael Orshansky, and Mattan Erez. NBTI-Aware DVFS: a New Approach To Saving Energy And Increasing Processor Lifetime. In the proceedings of ISLPED. Austin, TX, August, 2010, pages 253–258. (PDF) (BibTeX)
  • Doe Hyun Yoon, and Mattan Erez. Virtualized and Flexible ECC for Main Memory. In the proceedings of ASPLOS. Pittsburgh, PA, March, 2010, pages 397–408. (PDF) (SLIDES) (BibTeX)
  • Doe Hyun Yoon, and Mattan Erez. Flexible Cache Error Protection using an ECC FIFO. In the proceedings of SC09. Portland, OR, November, 2009, pages 49:1–12. (PDF) (BibTeX)
  • Doe Hyun Yoon, and Mattan Erez. Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches. In the proceedings of ISCA. Austin, TX, June, 2009, pages 116–127. (PDF) (SLIDES) (BibTeX)
  • Mattan Erez, Nuwan Jayasena, Timothy J. Knight, and William J. Dally. Fault Tolerance Techniques for the Merrimac Streaming Supercomputer. In the proceedings of SC05. Seattle, WA, November, 2005, pages 29:1–11. (PDF) (BibTeX)

Programming, Runtime, and Compilation

As VLSI processor technology matures, parallelism, locality, and bandwidth conservation become more critical. However, current programming models and compilers do not explicitly address these issues, which leads to reduced performance and low programmer productivity. My first attempt at tackling these issues was as a member of the Brook stream language developer team. We designed the language for scientific computing that exposed parallelism and locality to the programmer, and worked on a sophisticated optimizing compiler targeting Merrimac. The language eventually shifted focus towards programmable graphics processors, and was released to the public domain as BrookGPU. Currently, I am taking part in the development of the Sequoia programming model and software system, which builds on our experience with stream programming and Brook. A Sequoia programmer is empowered to explicitly reason about and express locality and parallelism at multiple levels. The result is a high-performance application that can easily be ported to a variety of traditional and emerging architectures.

  • Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon, Larry Kaplan, and Mattan Erez. Containment Domains: A Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems. In the Proceedings of SC12. Salt Lake City, UT, November, 2012, pages 58:1–11. (PDF) (SLIDES) (BibTeX)
  • Timothy Knight, Ji Young Park, Manman Ren, Mike Houston, Mattan Erez, Kayvon Fatahalian, Alex Aiken, William Dally, and Pat Hanrahan. Compilation for Explicitly Managed Memory Hierarchies. In the proceedings of PPoPP. San Jose, CA, March, 2007, pages 226–236. (PDF) (BibTeX)
  • Kayvon Fatahalian, Timothy J. Knight, Mike Houston, Mattan Erez, Daniel Reiter Horn, Larkhoon Leem, Ji Young Park, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. Sequoia: programming the memory hierarchy. In the proceedings of SC06. Tampa, FL, November, 2006. ACM, pages 83:1–13. (PDF) (BibTeX)

Memory System Design

The memory system is perhaps the most important component of modern architectures, because bandwidth is a severely limited resource. We are working on multiple aspects of memory systems design, including emerging memory technologies, error tolerance and resiliency, memory organization, and parallel memory systems.

  • Jungrae Kim, Michael Sullivan, Seong-Lyong Gong, and Mattan Erez. Frugal ECC: Efficient and Versatile Memory Error Protection through Fine-Grained Compression. In the Proceedings of SC15. Austin, TX, November, 2015, pages 12:1–12. (PDF) (BibTeX)
  • Jungrae Kim, Michael Sullivan, and Mattan Erez. Bamboo ECC: Strong, Safe, and Flexible Codes for Reliable Computer Memory. In the Proceedings of HPCA. Burlingame, CA, February, 2015, pages 101–112. (PDF) (BibTeX)
  • Dong Wan Kim, and Mattan Erez. Balancing Reliability, Cost, and Performance Tradeoffs with FreeFault. In the Proceedings of HPCA. Burlingame, CA, February, 2015, pages 439–450. (PDF) (BibTeX)
  • Tianhao Zheng, Jaeyoung Park, Michael Orshansky, and Mattan Erez. Variable-Energy Write STT-RAM Architecture with Bit-Wise Write-Completion Monitoring. In the Proceedings of ISLPED. Beijing, China, September, 2013, pages 229–234. (PDF) (BibTeX)
  • Minsoo Rhu, Michael Sullivan, Jingwen Leng, and Mattan Erez. A Locality-Aware Memory Hierarchy for Energy-Efficient GPU Architectures. In the Proceedings of MICRO. Davis, CA, December, 2013, pages 86–98. (PDF) (BibTeX)
  • Doe Hyun Yoon, Min Kyu Jeong, Michael Sullivan, and Mattan Erez. Towards Proportional Memory Systems. Intel Technology Journal, 17:118–139, 2012. ((URL)) (BibTeX)
  • Doe Hyun Yoon, Min Kyu Jeong, Michael B. Sullivan, and Mattan Erez. The Dynamic Granularity Memory System. In the proceedings of ISCA. Portland, OR, June, 2012, pages 548–559. (PDF) (BibTeX)
  • Min Kyu Jeong, Chander Sudanthi, Nigel Paver, and Mattan Erez. A QoS-Aware Memory Controller for Dynamically Balancing GPU and CPU Bandwidth Use in an MPSoC. In the Proceedings of DAC. San Francisco, CA, June, 2012, pages 855–860. (PDF) (BibTeX)
  • Min Kyu Jeong, Doe Hyun Yoon, Dam Sunwoo, Michael Sullivan, Ikhwan Lee, and Mattan Erez. Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems. In the proceedings of HPCA. New Oreleans, LA, February, 2012, pages 1–12. (PDF) (BibTeX)
  • Doe Hyun Yoon, Min Kyu Jeong, and Mattan Erez. Adaptive Granularity Memory Systems: A Tradeoff between Storage Efficiency and Throughput. In the proceedings of ISCA. San Jose, CA, June, 2011, pages 295–306. (PDF) (BibTeX)
  • Doe Hyun Yoon, Naveen Muralimanohar, Jichuan Chang, Parthasarathy Ranganathan, Norman P. Jouppi, and Mattan Erez. FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors. In the proceedings of HPCA. San Antonio, TX, February, 2011, pages 466–477. (PDF) (SLIDES) (BibTeX)
  • Doe Hyun Yoon, and Mattan Erez. Virtualized and Flexible ECC for Main Memory. In the proceedings of ASPLOS. Pittsburgh, PA, March, 2010, pages 397–408. (PDF) (SLIDES) (BibTeX)
  • Jung Ho Ahn, Mattan Erez, and William J. Dally. The design space of data-parallel memory systems. In the proceedings of SC06. Tampa, FL, November, 2006. ACM, pages 80:1–12. (PDF) (BibTeX)

Scalar Processor Architecture and Compilation + Virtual Memory Alternatives

Most current scalar processors still rely on an ISA that was designed in the late 1970s and does not reflect many changes in modern architectures. Therefore, I am exploring ISA designs that expose abstract, scalable, forward and backward compatible representations of internal modern microarchitectures to the compiler. The hope is that better communication mechanisms will allow the hardware and compiler to cooperate in achieving high performance, as opposed to the compiler tricking the hardware to achieve its goals or the hardware dynamically rediscovering information that is readily available to the compiler. One example of a cooperative ISA mechanism that I developed deals with register allocation. The Spills, Fills, and Kills technique allows hardware to rely on compiler-communicated liveness information to improve performance and reduce energy consumption.

  • Mattan Erez, Brian Towles, and William J. Dally. Spills, Fills, and Kills - An Architecture for Reducing Register-Memory Traffic. Technical report Concurrent VLSI Architecture (TR-23), Stanford University, July, 2000. (PDF) (BibTeX)

Microarchitecture

High-performance single-threaded execution will remain critical in the future, even as processors turn more and more to data-parallel execution units. Regardless of this increasing use of parallelism, most applications contain significant portions of control code that is difficult to parallelize, and certain key algorithms simply have no known parallel representation. In addition to my research on scalar architecture and compilation, I have also targeted several aspects of optimizing single-thread execution at the microarchitecture level. These include the novel eXtended Block Cache for efficient and effective instruction supply, techniques for better dynamic instruction scheduling, and various predictive techniques for both performance and hardware efficiency. My includes exploring how existing and new microarchitecture features can be exposed in an abstract way to the compiler and programmer. As an example, I am currently working on extending a traditional general-purpose processor core with stream-architecture mechanisms to form a hybrid processor that can efficiently execute both control-intensive and compute-intensive code. Our goals include re-using existing microarchitectural components whenever possible.

  • Dong Wan Kim, and Mattan Erez. Balancing Reliability, Cost, and Performance Tradeoffs with FreeFault. In the Proceedings of HPCA. Burlingame, CA, February, 2015, pages 439–450. (PDF) (BibTeX)
  • Doe Hyun Yoon, Min Kyu Jeong, Michael B. Sullivan, and Mattan Erez. The Dynamic Granularity Memory System. In the proceedings of ISCA. Portland, OR, June, 2012, pages 548–559. (PDF) (BibTeX)
  • Doe Hyun Yoon, and Mattan Erez. Flexible Cache Error Protection using an ECC FIFO. In the proceedings of SC09. Portland, OR, November, 2009, pages 49:1–12. (PDF) (BibTeX)
  • Doe Hyun Yoon, and Mattan Erez. Memory Mapped ECC: Low-Cost Error Protection for Last Level Caches. In the proceedings of ISCA. Austin, TX, June, 2009, pages 116–127. (PDF) (SLIDES) (BibTeX)
  • Jayanth Gummaraju, Mattan Erez, Joel Coburn, Mendel Rosenblum, and William J. Dally. Architectural Support for the Stream Execution Model on General-Purpose Processors. In the proceedings of PACT. Brasov, Romania, September, 2007, pages 3–12. ((URL)) (PDF) (BibTeX)
  • Jung Ho Ahn, Mattan Erez, and William J. Dally. The design space of data-parallel memory systems. In the proceedings of SC06. Tampa, FL, November, 2006. ACM, pages 80:1–12. (PDF) (BibTeX)
  • Mattan Erez, Brian Towles, and William J. Dally. Spills, Fills, and Kills - An Architecture for Reducing Register-Memory Traffic. Technical report Concurrent VLSI Architecture (TR-23), Stanford University, July, 2000. (PDF) (BibTeX)
  • Stephan Jourdan, Lihu Rappoport, Yoav Almog, Mattan Erez, Adi Yoaz, and Ronny Ronen. eXtendedBlock Cache. In the proceedings of HPCA. Toulouse, France, January, 2000, pages 61–70. (PDF) (BibTeX)
  • Adi Yoaz, Mattan Erez, Ronny Ronen, and Stephan Jourdan. Speculation Techniques for Improving Load Related Instruction Scheduling. In the proceeings of ISCA. Atlanta, GA, May, 1999, pages 42–53. ((URL)) (PDF) (BibTeX)
  • Adi Yoaz, Mattan Erez, and Ronny Ronen. US Patent #6,697,932: System and Method for Early Resolution of Low Confidence Branches and Safe Data Cache Accesses., February, 2004. (BibTeX)
  • Adi Yoaz, Gregory Pribush, Freddy Gabbay, Mattan Erez, and Ronny Ronen. US Patent #6,757,816: Fast Branch Misprediction Recovery Method and System., June, 2004. (BibTeX)
  • Adi Yoaz, Ronny Ronen, Lihu Rappoport, Mattan Erez, Stephan Jourdan, and Robert Valentine. US Patent #6,694,421: Cache Memory Bank Access Prediction., February, 2004. (BibTeX)
  • Adi Yoaz, Ronny Ronen, Lihu Rappoport, Mattan Erez, Stephan Jourdan, and Robert Valentine. US Patent #6,880,063: Memory Cache Bank Prediction., April, 2005. (BibTeX)

Massively Parallel Processor Arch and Microarch (GPUs and Streaming)

Recently, GPUs and other throughput-oriented architectures have heralded a new age of performance and efficiency. I am interested in how to architect such processors and program them, including when multiple heterogeneous cores are integrated tightly. We have been working on such problems recently, and I also have worked on Streaming Processor architectures prior to the advent of programmable GPUs.

For example, the streaming supercomputer uses stream architecture and advanced interconnection networks to give an order of magnitude more performance per unit cost than cluster-based scientific computers built from the same technology. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative applications by an order of magnitude or more. Hence a processing node with a fixed bandwidth (expensive) can support an order of magnitude more arithmetic units (inexpensive). This in turn allows a given level of performance to be achieved with fewer nodes (a 1-PFLOPS machine, for example, with just 8,192 nodes) resulting in greater reliability, and simpler system management. Merrimac is designed to be a streaming scientific computer that can be scaled from a $20K 2 TFLOPS workstation to a $20M 2 PFLOPS supercomputer. As lead architect my research involves all aspects of the system from hardware architecture, through the compiler and programming language, to the applications and algorithms.

  • Dong Li, Minsoo Rhu, Daniel R. Johnson, Mike O’Connor, Mattan Erez, Doug Burger, Donald S. Fussell, and Stephen W. Keckler. Priority-Based Cache Aladdress in Throughput Processors. In the Proceedings of HPCA. Burlingame, CA, February, 2015, pages 89–100. (PDF) (BibTeX)
  • Minsoo Rhu, and Mattan Erez. The Dual-Path Execution Model for Efficient GPU Control Flow. In the Proceedings of HPCA. Shenzhen, China, February, 2013, pages 561–602. (PDF) (BibTeX)
  • Minsoo Rhu, and Mattan Erez. Maximizing SIMD Resource Utilization in GPGPUs with SIMD Lane Permutation. In the Proceedings of ISCA. Tel Aviv, Israel, June, 2013, pages 356–367. (PDF) (BibTeX)
  • Minsoo Rhu, Michael Sullivan, Jingwen Leng, and Mattan Erez. A Locality-Aware Memory Hierarchy for Energy-Efficient GPU Architectures. In the Proceedings of MICRO. Davis, CA, December, 2013, pages 86–98. (PDF) (BibTeX)
  • Minsoo Rhu, and Mattan Erez. CAPRI: Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures. In the proceedings of ISCA. Portland, OR, June, 2012, pages 61–71. (PDF) (BibTeX)
  • Mattan Erez, and William J. Dally. Stream Processors. In Multicore Processors and Systems, pages 231–270. . Springer, 2009. ((URL)) (BibTeX)
  • Jayanth Gummaraju, Mattan Erez, Joel Coburn, Mendel Rosenblum, and William J. Dally. Architectural Support for the Stream Execution Model on General-Purpose Processors. In the proceedings of PACT. Brasov, Romania, September, 2007, pages 3–12. ((URL)) (PDF) (BibTeX)
  • Mattan Erez, Jung Ho Ahn, Jayanth Gummaraju, Mendel Rosenblum, and William J. Dally. Executing Irregular Scientific Applications on Stream Architectures. In the proceedings of ICS. Seattle, WA, June, 2007, pages 93–104. (PDF) (BibTeX)
  • Jung Ho Ahn, William J. Dally, and Mattan Erez. Tradeoff between Data-, Instruction-, and Thread-level Parallelism in Stream Processors. In the proceedings of ICS. Seattle, WA, June, 2007, pages 126–137. (PDF) (BibTeX)
  • Jung Ho Ahn, Mattan Erez, and William J. Dally. The design space of data-parallel memory systems. In the proceedings of SC06. Tampa, FL, November, 2006. ACM, pages 80:1–12. (PDF) (BibTeX)
  • Mattan Erez. Merrimac — High-Performance, Highly-Efficient Scientific Computing with Streams. PhD thesis, Stanford University, November, 2006. (PDF) (BibTeX)
  • Mattan Erez, Nuwan Jayasena, Timothy J. Knight, and William J. Dally. Fault Tolerance Techniques for the Merrimac Streaming Supercomputer. In the proceedings of SC05. Seattle, WA, November, 2005, pages 29:1–11. (PDF) (BibTeX)
  • Jung Ho Ahn, Mattan Erez, and William J. Dally. Scatter-Add in Data Parallel Architectures. In the proceedings of HPCA. San Francisco, CA, February, 2005, pages 132–142. (PDF) (BibTeX)
  • Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. Dally, and Eric Darve. Analysis and Performance Results of a Molecular Modeling Application on Merrimac. In the proceedings of SC04. Pittsburgh, PA, November, 2004, pages 42:1–10. (PDF) (BibTeX)
  • Nuwan Jayasena, Mattan Erez, Jung Ho Ahn, and William J. Dally. Stream Register Files with Indexed Access. In the proceedings of HPCA. Madrid, Spain, February, 2004, pages 60–72. (PDF) (BibTeX)
  • William J. Dally, Patrick Hanrahan, Mattan Erez, Timothy J. Knight, Francois Labonte, Jung-Ho Ahn, Nuwan Jayasena, Ujval J. Kapasi, Abhishek Das, Jayanth Gummaraju, and Ian Buck. Merrimac: Supercomputing with Streams. In the proceedings of SC03. Phoenix, AZ, November, 2003, pages 35:1–8. (PDF) (BibTeX)

Ultra Low Power Processors and Variation Tolerance

Not currently very active.

The most efficient operating point from an energy perspective is when supply voltage is close to the transistor’s threshold voltage. Unfortunately, at this near-threshold regime, process variations are greatly pronounced and operating frequency is low. We are developing architectural and circuit techniques that can tolerate the variations and utilize parallelism to overcome these limitations.

  • Evgeni Krimer, Patrick Chiang, and Mattan Erez. Lane Decoupling for Improving the Timing-Error Resiliency of Wide-SIMD Architectures. In the proceedings of ISCA. Portland, OR, June, 2012, pages 237–248. (PDF) (SLIDES) (BibTeX)
  • Robert Pawlowski, Evgeni Krimer, Joseph Crop, Jacob Postman, Nariman Moezzi-Madani, Mattan Erez, and Patrick Chiang. A 530mV 10-Lane SIMD Processor With Variation Resiliency in 45nm SOI. In the proceedings of ISSCC. San Francisco, CA, February, 2012, pages 492–494. (BibTeX)
  • Evgeni Krimer, Robert Pawlowski, Mattan Erez, and Patrick Chiang. Synctium: a Near-Threshold Stream Processor for Energy-Constrained Parallel Applications. IEEE IEEE Computer Architecture Letters, 9(1):21–24, January, 2010. (PDF) (BibTeX)

Network on Chip

Not currently active in NoC research

As the number of processing elements and cores on a chip continues to rise, the interconnection network grows in importance. We are developing novel NoC architecture, models for analytically predicting network performance and resource utilization, and a programming model that can enable the programmer to abstractly account for the network along with optimization algorithms.

  • Evgeni Krimer, Isaac Keslassy, Avinoam Kolodny, Isask’har Walter, and Mattan Erez. Static timing analysis for modeling QoS in networks on chip. Journal of Parallel and Distributed Computing, 71(5):687–699, May, 2011. (PDF) (BibTeX)
  • Tushar Krishna, Amit Kumar, Li-Shiuan Peh, Jacob Postman, Patrick P. Chiang, and Mattan Erez. Express Virtual Channels with Capacitively Driven Global Links. IEEE Micro, 29:48–61, August, 2009. (BibTeX)
  • Evgeni Krimer, Isaac Keslassy, Avinoam Kolodny, Isask’har Walter, and Mattan Erez. Packet-Level Static Timing Analysis for NoCs. Technical report CCIT #737, Department of Electrical Engineering, Technion, July, 2009. (BibTeX)
  • Tushar Krishna, Amit Kumar, Patrick Chiang, Mattan Erez, and Li-Shiuan Peh. NoC with Near-Ideal Express Virtual Channels Using Global-Line Communication. In the proceedings of High-Performance Interconnects (HotI-16). Stanford, CA, August, 2008, pages 11–20. (PDF) (BibTeX)

Current students

List of current students who are working with me on research:

Former students

List of former group members: