#### FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors

Doe Hyun Yoon<sup>†</sup> Naveen Muralimanohar<sup>‡</sup> Jichuan Chang<sup>‡</sup> Parthasarathy Ranganathan<sup>‡</sup> Norman P. Jouppi<sup>‡</sup> Mattan Erez<sup>†</sup>





<sup>+</sup>Electrical and Computer Engineering The University of Texas at Austin <sup>‡</sup> Intelligent Infrastructure Lab. Hewlett-Packard Labs.

## Challenge in Emerging NVRAM

- Finite write endurance
  - PCRAM cells wear out after 10<sup>8</sup> writes on average
  - Process variation exacerbates the problem
- Prior solutions
  - Custom wear-out tolerance mechanisms
  - Coarse-grained remapping
  - Dynamically Replicated Memory (DRM)
  - Error Correcting Pointer (ECP)
  - Stuck-At-Failure Error Recovery (SAFER)





# Limitations of Prior Solutions

- Only HARD cell errors
  - Can't detect/correct SOFT errors
    - Resistance drift in PCRAM
  - Can't detect/correct errors on periphery, wires, and packaging
- Error detection/correction logic within NVRAM devices
  - Memory industry favors SIMPLE and CHEAP devices
  - Error detection/correction in DRAM
    - DRAM chips only store data
    - Additional DRAM chips for storing redundant information
    - Error detection/correction is done at the memory controller
  - Better follow the same design strategy in NVRAM systems







- <u>Fine-grained Remapping with ECC and</u> <u>Embedded-pointer</u>
- Multi-tiered ECC
- Fine-grained remapping
- Detection/correction at the memory controller









- 6EC-7ED BCH code
  - 6-bit error correcting 7-bit error detecting
  - 61 bits for 64B data (less than 12.5% overhead)
- Tolerate up to 4 bit wear-out failures and 2 bit soft errors
- Multi-tiered decoding
  - Quick-, Slow-, and Mem-ECC
    - Extend Hi-ECC [Wilkerson+ ISCA'10]
  - Fast-path decoding for most initial periods





# Dealing with Intolerable Failures

- Eventually, some blocks become faulty
  - More than 4 wear-out failures per block
- Coarse-grained remapping (prior solutions)
  - Leverage virtual-to-physical mapping
    - Mapping unit: 4kB or larger
  - A block with intolerable failure maps out the whole page
- Fine-grained remapping
  - Disable only a faulty block
    - Mapping unit: 64B
  - Effectively handle both random and concentrated errors





# Fine-grained Remapping (FR) with Embedded-pointer

- Embed a 64-bit pointer within a faulty block
  - There are still-functional bits in a faulty block
  - Use 7-Modular Redundancy to tolerate the failures
- 1-bit D/P flag per 64B block
  - Identify a block is remapped or not



# How to Mitigate This Penalty?

- Remap pointer cache
  - Cache remap pointers
  - Avoid reading remap pointer from NVRAM when cache hit
- Hash based index cache
  - Pre-defined hash functions for remapping
    - Compute, not cache, remap pointer
  - H-idx: which hash function is used for remapping?
    - 0: not remapped
    - 1 or 2: remapped using one of the hash functions
    - 3: hash collision
      - All candidate locations are already used for other blocks
      - Need to read the remap pointer
    - 2 bits per 64B block



Non-Volatile Memories Workshop 2011



## Hash-Based Index Cache



# Memory System Organization with FREE-p





Non-Volatile Memories Workshop 2011



### **Evaluation**





# Capacity vs. Lifetime



Assume 16GB capacity, perfect wear-leveling, 12.8GB/s channel, constant write traffic (1/3 traffic of peak rate)



- In-order core with PCRAM
- Performance depends on NVRAM wear-out status

















- FREE-p combines multi-tiered ECC and FR
  - Protect against both Hard AND Soft errors
  - 11.5% longer lifetime over ECP6
  - Less than 2% performance degradation, even at 7 years
  - 12.5% storage overhead (same as current DRAM protection)
- Everything is implemented at the memory controller
  - End-to-end protection
    - Hard and soft errors in the cell array
    - Errors on the periphery, wires, packaging, ...
  - System designers determine protection level
    - Can be extended to chipkill-correct
  - Simple and cheap (commodity) NVRAM devices



Non-Volatile Memories Workshop 2011



#### FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors

Doe Hyun Yoon<sup>†</sup> Naveen Muralimanohar<sup>‡</sup> Jichuan Chang<sup>‡</sup> Parthasarathy Ranganathan<sup>‡</sup> Norman P. Jouppi<sup>‡</sup>



Mattan Erez<sup>†</sup>



<sup>†</sup>Electrical and Computer Engineering The University of Texas at Austin Intelligent Infrastructure Lab.
Hewlett-Packard Labs.