Lecture 1 (8/26/2009) — Introduction

Lecture 1 Slides

Lecture 1 Discussion


What is Computer Architecture?

(A personal view)

“Form follows function” is a famous principle in architecture (buildings, not computer) most commonly attributed to Louis Sullivan. It very much holds true for computer architecture as well. Our job as architects is to understand required functionality, as defined by a user, and develop the right forms out of our building materials to serve that function. Our building materials are defined by technology and are currently, for the most part, VLSI CMOS, mass storage, and interconnect. In essence, we connect between physical devices and the user.

It is important to always keep in mind that the systems we are working on have a user at one end and physics at the other. One of the layers connecting the two is the processor and memory system, but this is only a part of what we need to understand and modify. Other layers in the system include :

Algorithm
Algorithms are formal representations of user requirements.
Programming system and compiler
This includes languages, compilers, tools, … The programming system helps translate algorithms so that they can be run on hardware.
Operating and hardware systems
The operating system provides common basic services to the programming system. It can in some ways be considered as part of the programming system, but since it’s services are shared for all programs it is also an intricate part of the hardware system. The hardware system is the entire platform, which includes interconnect, networking, mass storage, …
Processor and memory
The mainstay of computer architecture.
Circuits
Just as algorithms are a formal representation of users, we can think of circuits as an abstraction of the physics.

In today’s complex systems it is crucial to work across layers to achieve our goals. While architects don’t typically develop algorithms they do need to understand algorithm properties and to design systems with that in mind. It is not uncommon to combine architecture with compilers, operating systems, programming languages, and circuits.

Five “Major” Challenges Facing Architects Today

Again, a personal view — more challenges do exist.

Performance

History shows that we can never have enough performance. This holds true for any market segment from mobile to supercomputers (a few specific examples in the slides Δ). Performance is measured in operations per second (FLOPS for floating-point operations per second and IPS for instructions per second, as in MIPS). Another important measure of performance is bandwidth.

Efficiency

Efficiency refers to cost, energy, and power. Keeping costs down has always been important. Energy and power are becoming key limitation in not just the mobile segment, but also in traditional desktops, datacenters, and supercomputers. In fact, power considerations are the main limit to improving the performance of traditional CPUs, and the cooling and maintenance of supercomputers.

Designability

The sheer number of devices that can be designed into a processor today (now measured in billions) significantly complicates the design process. Major CPU design teams include hundreds of people and verification costs and consumes as much time as the design itself. CAD and verification tools have simply not kept up with the increase in VLSI complexity.

Programmability

It is important to remember that all challenges must be met while maintaining programmability. Programmability is clearly a requirement for desktops, datacenters, supercomputers, and laptops, but it is also crucial to the success of many embedded systems. Some for of programmability helps with product longevity, keeping up with evolving standards, and simply fixing bugs and improving the time to market.

Reliability

As device size continues to scale down, reliability issues are becoming increasingly important.

Parallelism

What is parallelism?

  • Parallelism is everywhere — we can’t really build physical systems that do not have parallelism:
    • multiple transistors
    • signals
    • bit manipulation
    • circuits
  • More traditional parallelism in computer architecture:
    • multiple cores
    • multiple processors
    • pipelining
    • superscalar/VLIW
    • vectors/SIMD
    • disk/CPU/GPU/…
  • Concurrency in the OS does not imply parallelism.
    • Can have concurrent processes in the OS running on a single processor.
    • These multiple processes are active at the same time from the OS perspective, but are not executing at the same time from the HW perspective.
    • From the logical perspective (programmer), there is little difference between the two.
      • Many of the problems with concurrent systems are shared with parallel systems.

How does parallelism help with the challenges?

Performance

Since today’s chips can have hundreds or thousands of FPUs/ALUs (floating-point units / arithmetic and logical units), parallelism must be used to take full advantage of the compute resources.

Parallelism also helps in sustaining high bandwidth, both on and off chip, through wide interconnect paths, pipelining, and efficient scheduling and communication resource sharing.

Efficiency

CMOS circuits have a non-linear relation between power and the speed at which they operate, with power increasing super-linearly with speed. Therefore, if we want to achieve a given level of performance, we can do so with lower power and energy by having parallel units that operate more slowly than a fast serial unit. Of course, this assumes an efficient parallel algorithm exists and that it is possible to slow down the clock and reduce voltage.

Designability

Modular design with many instances of a unit operating in parallel (i.e., multi-core).

Programmability

Not much help here.

Reliability

Redundancy through parallelism.

Locality

What is locality?

  • Locality in architecture often means one of:
    • ‘Spatial locality’ — two memory locations that are close to one another in the memory namespace will likely be accessed within a certain period of time.
    • ‘Temporal locality’ — a memory location is likely to be accessed multiple times within a certain period.
    • ‘Producer-consumer locality’ — a limited form of temporal locality, where a location is written and then read soon (and usually not read again).
  • In this class we will use locality in two ways:
    • Temporal locality as above.
    • ‘Physical locality’ — actual physical structures that are close to one another in space. For example, an ALU and a directly connected register file are local to one another, as are two operands in the same register file.

How does locality help with the challenges?

Performance

Locality improves bandwidth, which in turn improves arithmetic performance as more functional units can be supported on chip. Because of the properties of interconnect (wires) in modern VLSI, bandwidth drops roughly linearly as the distance of communication grows. Therefore, keeping distances short and exploiting locality benefits performance.

Efficiency

While bandwidth drops with distance, power consumption increases at least linearly with distance. Thus, exploiting locality and reducing the number of times long wires are traversed can significantly reduce power.

Programmability

Not much help here.

Designability

Locality is another important part of modular design. Keeping all structures local to one another eases process shrinks.

Reliability

As above, maintaining local connections reduces the chance that the wires will fail.

Hierarchy

What is hierarchy?

  • Design principle that lets us tackle complex systems.
    • Divide and conquer approach.
  • Hierarchy also used for cache hierarchy where each level sits “higher” and is bigger than the one below it.
  • The hierarchy principle we will be taking advantage of is the former one of encapsulation, however, we will often mention the storage hierarchy as well.

How does hierarchy help?

Performance

Hierarchy enables asynchronous and parallel subsystems, which can improve performance. Hierarchy can also improve locality.

Efficiency

Basically same as above.

Programmability

Hierarchy encapsulates complex subsystems and can present a simple and effective interface that can be used by the programming system or programmer.

Designability

Helps handle the complexity of multi-billion transistor chips.

Reliability

Encapsulate and protect subsystems — fault isolation and repair.

Discussion

Lecture 1 Discussion