Lecture 2 (1/26/2015) — Locality Mechanisms


Review of what a computer looks like

The relevant components of a computer are:

  • ALU
  • Register File
  • Buses and forwarding paths
  • Load Store Unit (LSU)
  • Hierarchy of memories - To a programmer the memory hierarchy is invisible.The purpose of having this hierarchy in place is to give the programmer the illusion of a large, low latency storage.

Locality through Reservation Station, Bypass Network, Caches and Registers

While it is clear that registers and caches help in keeping things “local”, why do the Reservation Station or the Bypass Network matter? Both the Reservation Station and Bypass network try to re-use a computed value instead of writing it back to the register file. The Reservation Station also keeps recent instructions in it so that it does not have to go to the instruction cache to fetch the instruction again when the instruction is ready to go.

The Advantages of being “Local”

Locality improves Latency The widening processor-memory latency gap means that the cost of accessing memory can range from a few to hundreds of cycles. Registers can generally be accessed in a single cycle. In general, as we’ll see in a later lecture, smaller means lower latency because of wire length and the types of devices that can be used; can use large devices if only need a “few”.

Locality improves energy efficiency Moving data around (especially across chip, from off chip, and DRAM) is far more expensive in terms of energy than even a FP computation; worse, the gap grows as technology scales. So it makes sense to compute and consume the result locally.

Locality improves Bandwidth Locality helps bandwidth in two main ways. First, by accessing data locally rather than through a global structure, contention on the global structure is lowered and effective BW can increase. Additionally, local interconnect can, almost always, provide higher bandwidth than a more global one. Adding ports also easier and more cost-effective for small structures. This last point is particularly true when using logic to create storage (i.e., registers/latches) rather than SRAMs or other memory cells.

Metrics for locality

Arithmetic intensity, which is the number of useful operations performed per byte/word brought into/written out of a particular locality level. A metric for bandwidth that is proportional to program progress rather than time.

Cache (or any structure) hits/misses measure locality in terms of latency improvements, but not the same as bandwidth (prefetchers, writebacks, …).