690 uint error_name_mask=0,
691 uint error_loc_mask=0,
721 uint_t num_tasks_in_color,
737 uint error_name_mask=0,
738 uint error_loc_mask=0,
769 uint_t num_tasks_in_color,
785 uint error_name_mask=0,
786 uint error_loc_mask=0,
874 bool update_preservations,
941 uint_t preserve_mask=
kCopy,
942 const char* my_name=0,
946 const char* ref_name=0,
949 uint_64t ref_offset=0,
1004 uint_t preserve_mask=
kCopy,
1005 const char* my_name=0,
1009 const char* ref_name=0,
1012 uint_64t ref_offset=0,
1052 const SysErrT* error_to_report=0
1075 const SysErrT* error_to_report=0
1098 const SysErrT* error_to_report=0
1133 uint system_loc_mask,
1160 uint error_loc_mask,
1331 bool owner_writes=
true
1444 unit error_location_mask
1472 unit error_location_mask,
1474 std::vector<SysErrT> errors
1477 uint error_name_mask,
1511 uint error_name_mask,
1513 unit error_location_mask,
1515 std::vector<SysErrT> errors
Same as processor?
Definition: cd.h:355
std::vector< uint64_t > get_pa_starts()
Starting physical addresses.
float GetErrorProbability(SysErrT error_type, uint error_num,)
Ask the CD framework to estimate error/fault rate.
char * syndrome_
Value of syndrome.
Definition: cd.h:418
CDErrT
Type for specifying error return codes from an API call – signifies some failure of the API call its...
Definition: cd.h:470
A type to uniquely name a CD in the tree.
Definition: cd.h:170
First execution.
Definition: cd.h:206
virtual bool InternalCanRecover(uint error_name_mask, unit error_location_mask)
Method to test if this CD can recover from an error/location mask.
char[] get_data()
Data value read (erroneous)
virtual CDErrT Regenerate(void *data_ptr, uint64_t len)=0
Pure virtual interface function for regenerating data as restoration type.
bool Test(void)
Non-blocking call to test whether the event completed.
cd_internal::CDEvent event_
Definition: cd.h:1419
virtual void InternalEscalate(uint error_name_mask, unit error_location_mask, std::vector< SysErrT > errorsuint error_name_mask,)
Escalate error/failure to parent.
A cabinet.
Definition: cd.h:357
std::vector< uint64_t > get_lengths()
Lengths of affected regions.
SysErrInfo error_info_
Error-specific extra information.
Definition: cd.h:455
bool destroy_cd_object_hint_
Definition: cd.h:1375
CDErrT SetPGASUsage(void *data_ptr, uint64_t len, PGASUsageT region_type=kShared)
Declare how a region of memory behaves within this CD (for Relaxed CDs)
CDModeT
Type for specifying whether a CD is strict or relaxed.
Definition: cd.h:189
PGASUsageT
Different types of PGAS memory behavior for relaxed CDs.
Definition: cd.h:240
Interface to degraded memory error information.
Definition: cd.h:426
A strict CD.
Definition: cd.h:189
Some channel loss.
Definition: cd.h:309
PreserveUseT
Type to indicate whether preserved data is from read-only or potentially read/write application data...
Definition: cd.h:215
CDErrT CDProfileStartPhase(bool collective=true, char *phase_name=0)
Notify the CD Profiler that the application is entering a different execution phase.
A relaxed CD.
Definition: cd.h:190
Data to be preserved is read-only within this CD.
Definition: cd.h:218
CDErrT RegisterRecovery(uint error_name_mask, uint error_loc_mask, RecoverObject *recover_object=0)
Register that this CD can recover from certain errors/failures.
Interface to soft memory error information.
Definition: cd.h:403
functionality
Definition: cd.h:312
Recovery method that can be inherited and specialized by user.
Definition: cd.h:1499
uint number
Unique ID within level.
Definition: cd.h:172
(control/reachability failure).
Definition: cd.h:314
CDHandle * GetCurrentCD()
Accessor function to current active CD.
CDErrT RegisterDetection(uint system_name_mask, uint system_loc_mask,)
Declare that this CD can detect certain errors/failures by user-defined detectors.
char[] get_syndrome()
Value of syndrome.
virtual void InternalReexecute()
Reexecute-style default recovery.
uint64_t length_
Length of affected access.
Definition: cd.h:415
Definitely shared for actual communication.
Definition: cd.h:241
Entirely private to this CD.
Definition: cd.h:246
CDErrT Wait(void)
Blocking call waiting on the event to complete.
communication during this CD.
Definition: cd.h:244
CDHandle * GetRootCD()
Accessor function to root CD of the application.
CDErrT Destroy(bool collective=false)
Destroys a CD.
char * data_
Data value read (erroneous)
Definition: cd.h:416
CDErrT Complete(bool collective=true, bool update_preservations,)
Completes a CD.
CDErrT CDAssertNotify(bool test_true, const SysErrT *error_to_report=0)
User-provided detection function for failing a CD.
SysErrLocT error_location_
Location of error.
Definition: cd.h:454
uint64_t syndrome_len_
Length of syndrome.
Definition: cd.h:417
std::vector< SysErrT > Detect(CDErrT *err_ret_val=0)
Check whether any errors occurred while CD the executed.
CDErrT SetCurrentCD(const CDHandle *cd)
Accessor function for setting the current active CD.
(info includes message info)
Definition: cd.h:307
Within a part of a core.
Definition: cd.h:352
Processor.
Definition: cd.h:354
A core.
Definition: cd.h:353
Type for specifying errors and failure.
Definition: cd.h:452
CDInternalPtr cd_instance_
Definition: cd.h:1371
CDHandle * Create(char *name=0, CDModeT type=kStrict, uint error_name_mask=0, uint error_loc_mask=0, CDErrT *error=0)
Single-task non-collective Create.
uint level
Level within the tree (root=0)
Definition: cd.h:171
SysErrLocT
Type for specifying errors and failure location names.
Definition: cd.h:351
uint64_t get_pa_start()
Starting physical address.
virtual void Recover(CDInternalPtr *cd_instance, uint error_name_mask, unit error_location_mask, std::vector< SysErrT > errors)
Recover method to be specialized by inheriting and overloading.
Definition: cd.h:1508
Interface for specifying regeneration functions for preserve/restore.
Definition: cd.h:526
PreserveMechanismT
Type for specifying preservation methods.
Definition: cd.h:499
uint DeclareErrName(const char *name_string)
Create a new error/failure type name.
CDExecutionModeT
Type for specifying whether the current CD is executing for the first time or is currently reexecutin...
Definition: cd.h:206
Entire system.
Definition: cd.h:359
includes affected PC and perhaps bounds on the error?)
Definition: cd.h:310
uint64_t va_start_
Starting virtual address.
Definition: cd.h:414
Data to be preserved will be modified by this CD.
Definition: cd.h:219
An object that provides a handle to a specific CD instance.
Definition: cd.h:661
Rexecution.
Definition: cd.h:207
CDErrT UndeclareErrLoc(uint error_name_id) class SysErrInfo
Free a name that was created with DeclareErrLoc()
Definition: cd.h:378
uint64_t get_va_start()
Starting virtual address.
CDHandle * CreateAndBegin(uint_t color, uint_t num_tasks_in_color, char *name=0, CDModeT type=kStrict, uint error_name_mask=0, uint error_loc_mask=0, CDErrT *error=0)
Collective Create+Begin.
CD, essentially equivalent to kShared for CDs.
Definition: cd.h:242
CDErrT Preserve(void *data_ptr, uint64_t len, uint_t preserve_mask=kCopy, const char *my_name=0, const char *ref_name=0, uint_64t ref_offset=0, const RegenObject *regen_object=0, PreserveUseT data_usage=kUnsure)
Preserve data to be restored when recovering (typically reexecuting the CD from right after its Begin...
Module.
Definition: cd.h:356
CDHandle * GetParent()
Get CDHandle to this CD's parent.
std::vector< uint64_t > va_starts_
Starting virtual addresses.
Definition: cd.h:434
CDErrT CDAssert(bool test_true, const SysErrT *error_to_report=0)
User-provided detection function for failing a CD.
std::vector< uint64_t > get_va_starts()
Starting virtual addresses.
std::vector< uint64_t > pa_starts_
Starting physical addresses.
Definition: cd.h:433
Init called more than once.
Definition: cd.h:471
CDErrT CDAssertFail(bool test_true, const SysErrT *error_to_report=0)
User-provided detection function for failing a CD.
CDErrT UndeclareErrName(uint error_name_id)
Free a name that was created with DeclareErrorName()
A class that represents the interface to the internal implementation of an actual CD...
Definition: cd.h:1429
CDNameT GetName()
Get the name/location of this CD.
float RequireErrorProbability(SysErrT error_type, uint error_num, float probability, bool fail_over=true)
Request the CD framework to reach a certain error/failure probability.
SysErrNameT
Type for specifying system errors and failure names.
Definition: cd.h:300
uint64_t pa_start_
Starting physical address.
Definition: cd.h:413
CDErrT Begin(bool collective=true)
Begins a CD.
Call did not execute as expected.
Definition: cd.h:472
No errors/failures.
Definition: cd.h:300
CDHandle * Init(bool collective=trueCDErrT *error=0)
Initialize the CD runtime.
uint DeclareErrLoc(const char *name_string)
Create a new error/failure type name.
An object that provides an event identifier to a non-blocking CD runtime call.
Definition: cd.h:1392
Some grouping of cabinets.
Definition: cd.h:358
std::vector< uint64_t > lengths_
Lengths of affected regions.
Definition: cd.h:435
by the CD (treated as Read/Write for now, but may be optimized later)
Definition: cd.h:216
SysErrNameT error_name_
Name of error.
Definition: cd.h:453
uint64_t get_length()
Length of affected access.
CDErrT SetPGASOwnerWrites(void *data_ptruint64_t len, bool owner_writes=true)
Simplify optimization of discarding relaxed CD log entries.
uint64_t get_syndrome_len()
Length of syndrome.