Software Design For Resilient Computer Systems
by Igor Schagaev /
2016 / English / PDF
7.2 MB Download
This book addresses the question of how system software should be
designed to account for faults, and which fault tolerance
features it should provide for highest reliability. The authors
first show how the system software interacts with the hardware to
tolerate faults. They analyze and further develop the theory of
fault tolerance to understand the different ways to increase the
reliability of a system, with special attention on the role of
system software in this process. They further develop the general
algorithm of fault tolerance (GAFT) with its three main
processes: hardware checking, preparation for recovery, and the
recovery procedure. For each of the three processes, they analyze
the requirements and properties theoretically and give possible
implementation scenarios and system software support required.
Based on the theoretical results, the authors derive an
Oberon-based programming language with direct support of the
three processes of GAFT. In the last part of this book,
they introduce a simulator, using it as a proof of concept
implementation of a novel fault tolerant processor architecture
(ERRIC) and its newly developed runtime system feature-wise and
performance-wise. The content applies to industries such as
military, aviation, intensive health care, industrial control,
space exploration, etc.
This book addresses the question of how system software should be
designed to account for faults, and which fault tolerance
features it should provide for highest reliability. The authors
first show how the system software interacts with the hardware to
tolerate faults. They analyze and further develop the theory of
fault tolerance to understand the different ways to increase the
reliability of a system, with special attention on the role of
system software in this process. They further develop the general
algorithm of fault tolerance (GAFT) with its three main
processes: hardware checking, preparation for recovery, and the
recovery procedure. For each of the three processes, they analyze
the requirements and properties theoretically and give possible
implementation scenarios and system software support required.
Based on the theoretical results, the authors derive an
Oberon-based programming language with direct support of the
three processes of GAFT. In the last part of this book,
they introduce a simulator, using it as a proof of concept
implementation of a novel fault tolerant processor architecture
(ERRIC) and its newly developed runtime system feature-wise and
performance-wise. The content applies to industries such as
military, aviation, intensive health care, industrial control,
space exploration, etc.