Risk management of a commercial computer system starts when the system is specified and purchased, continues with installation and operation, and ends when the system is taken out of service and all critical data have been successfully migrated to a new system.
Risk evaluation and assessment. Define the severity, probability, and risk of each hazard, for example, by using past experience from the same or similar systems. Determine acceptable levels of risk and identify the hazards that would need mitigation to reach those levels. Identify and implement steps to mitigate risks.
On-going (re)evaluation and control. On an on-going basis, evaluate the system for new hazards and changes in risk levels. Adjust risk and mitigation strategy as necessary.
These activities should follow a risk management plan and the results should be documented in a risk management report.
The first step in the risk management process is the risk analysis, sometimes called risk identification or Preliminary Hazard Analysis (PHA). The output of this phase is the input for risk evaluation. Inputs for risk analysis include:
- specifications of equipment including hardware and software;
- user experience with the same system already installed;
- user experience with similar systems;
- IT staff experience with the same or similar network equipment;
- experience with the vendor of the system;
- failure rates of the same or similar system (mean time between failures) and resulting system downtime;
- trends of failures;
- service records and trends;
- internal and external audit results.
Inputs, for example, can come from operators, the validation group, Information Technology (IT) administrators, or from Quality Assurance (QA) personnel as the result of findings from internal or external audits.
- Hard drive failure on local personal computers (PCs) or on the server computer can cause severe system downtime and loss of data.
- Loss of network connectivity due to hardware failure, for example, the network interface card, can cause system downtime;
- System overloads can cause a slow-down of operations and system downtime.
- Inadequate vendor qualification or absent specifications on vendor support purchasing agreements can result in reduced uptime because of missing support—in the case of hardware, firmware, or software problems.
- Inadequate or absent documentation of installation can make it difficult to diagnose a problem.
- Inadequate or absent verification of security access functions can result in unauthorized access to the system.
- An insufficient or absent plan for system backup can result in data loss in case of system failure.
- Poor or absent documentation of hardware and software changes can make it difficult to diagnose a problem.
- Inadequate quality assurance policies and procedures or inadequate reviews can lead to poor system quality.