ISSN: 2168-9792
+44-77-2385-9429
Research Article - (2017) Volume 6, Issue 1
The dependency of complex embedded Safety-Critical Systems across, Avionics and Aerospace domains, on their underlying software and hardware components has gradually increased with progression in time. Such application domain systems are developed based on a complex integrated architecture, which are modular in nature. Engineering practices assured with system safety standards to manage the failure, faulty and unsafe operational conditions are very much necessary. System safety analyses involves the analysis of complex software architecture of the system, a major aspect in leading to fatal consequences in the behavior of Safety-Critical Systems, provides high reliability and dependability factors during their development. In this paper, we propose an architecture fault modeling and the safety analyses approach that will aid in identifying and eliminating the design flaws. The formal foundations of SAE Architecture Analysis and Design Language (AADL) augmented with the Error Model Annex (EMV) are discussed. The fault propagation, failure behavior and the composite behavior of the design flaws/ failures are considered for architecture safety analysis. The illustration of the proposed approach is validated by implementing the Speed Control Unit of Power Boat Autopilot (PBA) system.
The Error Model Annex (EMV) guides with the pattern of consideration and inclusion of probable failure scenarios and propagation of fault conditions in the Speed Control Unit of Power Boat Autopilot (PBA). This helps in validating the system architecture with the detection of the error event in the model and its impact in the operational environment. This also provides an insight of the Certification impact that these exceptional conditions pose upon at various criticality levels, design assurance levels and its implications in verifying and validating the designs.
<Keywords: Architecture analysis and Design language (AADL), Error annex (EMV), Fault-tree analysis (FTA),Primary events database (PED), Safety analyses; Safety-critical systems
Systematic analyses of the architectural models modelled using the Model-Based Engineering (MBE) [1] practices, early and at every abstraction level imbibes a greater confidence in the integration of the system. The creation and analysis of architectural models of a system supports prediction and understanding the system's capabilities and its operational quality attributes. These attributes include performance, reliability, reusability, safety and security. All along the developmental lifecycle, the faults such as their failure modes and their propagation effects, at system-level can be predicted. Such issues remain un-noticed until system integration and testing. This proves to be a costly rework resulting in an unaccounted project time, cost and maintenance.
For safety critical advanced complex embedded systems, the system design and development is in compliance with the safety standards, and engineered with practices as specified by MIL-STD882 [2], SAE ARP- 4761 [3] and DO-178B/C [4]. The process of development, management and controlling these systems in conformance with the safety practices, proves to have an impact on the system requirements, post system integration and test. With the evolution of the system, availability and reliability of these models are to be consistent and this poses a great challenge.
These safety practices include various availability and reliability prognosis with the help of system architectural models. Model-Based Engineering approaches for safety analyses address these issues and prove to provide consolidated information about the informal requirements and the architecture model of the system. The safety analyses performed on a system also takes into consideration, the physical environment of its deployment and functioning. Due to insufficient support of the formal languages trend is to make use of architecture description languages such as Architecture Analysis and Design Language (AADL), and Society of Automotive Engineers (SAE) standard. AADL, a highlevel architectural descriptive language, basically provides a platform for overall integration of various system recommended components via formal semantics and syntax. This component-based modelling language is extended with the introduction of sublanguages as Annexes. AADL is packaged with multiple Annex sublanguages such as Error Model Annex (EAnnex) and Behaviour Annex (BAnnex) as standards. The EAnnex standard is suitably augmented with safety semantics and ontology of fault propagation, supporting error annotations on the architectural models [5]. This thus enables the component error models and their interactions to be considered in context to the system architecture modeled using AADL.
This paper presents our contributions as a case study implementation (Speed Control Unit of Power-Boat Autopilot), to the standard approach for the illustration of its application. The paper is organized as follows: Firstly, we summarize the concept of Architecture Analysis and Design Language (SAE AADL) and Error Model Annex (EAnnex/EMV2). Next we provide an illustration of the architecture fault model specification for Speed Control Unit of a Power-Boat Autopilot (PBA). We also discuss the various safety analyses methods involved in MIL-STD882 safety practice. Finally, we conclude the paper with the assessment of these safety analyses based on the architecture fault models.
Architecture Analysis and Design Language (AADL), an SAE International standard, a unified framework providing extensive formal foundations for Model-Based Engineering (MBE) practices. These practices extend throughout the system design, integration and assurance with safety standards. AADL distinctly represents a system hardware and software components and their interactions via interfaces. Critical real-time computational factors such as performance, dependability, safety, security and data integrity can be rigorously analysed with AADL.
AADL also integrates custom analyses and specification techniques during the engineering process. This allows in the development and analysis of a single, unified system architectural model. AADL can be extended using the specialized language constructs that can be attached to the components of the architectural model defined by AADL. These components are reinforced with additional characteristics and requirements, referred to as Annex languages. The architectural model components are annotated with these properties and annex language clauses for functional and non-functional analyses. Error Model Annex (EMV), which is an extension of AADL aids in describing the failure conditions and fault propagations as error events, propagations, occurrence and their distribution properties. With the integration of these constructs in the AADL model/s, as shown in Figure 1 [6], the existing components are extended as current models liable for Safety Evaluation and Analyses. This can be done with the help of the algorithms in OSATE or by using other third party tools.
Error annex
The Error Model Annex (EAnnex) is a sublanguage of AADL This sublanguage extension includes the analyses of the runtime architectures. The EAnnex [7,8] annotates the hardware and the software component architectures with error states, error events, error transitions and error propagations that may affect the component interacting with each other. In Error Model Annex sub clause conditions can be specified under which the errors are propagated through designated component ports. Error Model Annex basically helps in defining the fault models, hazards, fault propagation, failure modes and effects, as well as specifying compositional fault behavior. AADL Error Model Annex supports architectural fault modeling at three levels of abstraction [9].
1. Modeling of faults in systems and their implications on other dependent components of the physical environment of its operation through propagation of these faults. (Includes Hazard identification, fault impact analysis).
2. Modeling of faults occurring in a component of the system and analyzing the behavior of the same across various modes termed as failure modes and its effects on other components and its related propagations. It is also inclusive of the recovery strategies involved.
3. Compositional abstraction of system error behavior in terms of its subsystems.
Error Model Annex (EMV2) overlays major focus on the standards set of error types and error propagation, defined by AADL as a standard syntactic construct through the introduction of annex libraries. These annex libraries provides an overlook of the formally specified error propagation behaviors [10,11]. Some of the common error types being [9];
1. Commission and omission errors: Represents loss of message/ command, failure to provide readings from a component.
2. Timing errors: Arrival rate, service too early or late, unsynchronized rate.
3. Value errors: Individual service item error or errors in a sequence of values
4. Replication errors: Replicates of states or services being communicated.
5. Concurrency errors: Accessing shared logical or physical resources.
Along with these the error model types can be referenced in the Error Model Annex sub clause. The constructs for the EMV2 are similar to the syntax and style as defined for AADL. An exceptional being that, any set of textual language constructs can be included within an annex, that includes Object Constraint Language (OCL) [12] or a temporal logic notation [13].
Implementation of proposed research: In this section we exhibit the architecture fault modeling in AADL, along with the extension of EMV2, at three levels of abstraction with a suitable case study, Speed Control Unit of Power-Boat Autopilot (PBA). This unit is a simplified speed control model, including a pilot interface unit for input of relevant Power-Boat Autopilot information, a speed sensor that sends speed data to the PBA, the PBA controller, a throttle actuator that responds to the commands specified by the PBA controller and a display unit. The type definitions defining the component, component names, their runtime category and interfaces are identified and defined. The speed sensor, pilot interface, throttle actuator and the display unit are modeled as devices, while the PBA control functions are represented as process, as shown in Figure 2. With all these we perform the safety analyses with the specification of the source of error and its propagation across the system and its components. This is carried out by defining the error states and their corresponding compositional fault behaviour. This is followed by the expansion of the fault logic with respect to its error behaviour related with each component of the system and its response to the failures.
Specification of error source and propagation
The source of errors and their propagation with respect to each component of a system in PBA Speed Control Unit is defined, as shown in Figure 3. In the case study on_flow_src related to the device, pilot interface unit is sourcing the fault. The component error propagations are also defined with the error No-Value and No-Service.
The component error behaviour is also defined for the system components that correlate to the faults that are possible to occur. Here in this system the No-Value due to failure passes on from the pilot interface unit to the throttle actuator. The same is being conveyed to the display unit feature status. In addition to this fault, there occurs another propagation of error i.e. No-Service. This fault results in the failed state of the system. Here we can observe that the specification is automatically inherited by the instances of each component and their interactive neighbours. The error propagation paths inherent in such system architecture AADL models form a basis, as a need for the representation of Failure Mode and Effect Analysis (FMEA) and Common Cause Analysis (CCA).
Composite error behaviour
The error model annex library is associated to the state machine defined for the system component model using the declaration use behaviour, as shown in Figure 4. This maps the error state behaviour of the sub-components (both hardware as well as software components) onto the error states of the system itself. In this case study of Speed Control Unit of PBA, we have two error states defined for each component i.e. Failure and Failed. But here we have considered only the Failed state as the sub-component error state and the state Operational as the recovery state. We can see in this example that the system error behaviour is mapped from the sub-component behaviours defined as;
[throttle. Failed and display_ unit_ inter. Failed]-> Failed;
We assume that the system fails if either of the devices i.e. throttle actuator or the display unit behaves in the Failed state. While it tends to recover from the Failed state and remains to be Operational even if the display unit fails, as the speed control unit mainly depends on the throttle command in maintaining and controlling the speed of the PBA.
[display_ unit_ inter. Failed]-> Operational;
This provides a scope for redundancy management for fault management capability of the system as well analyse for extensive solutions for reliability and availability analyses through various hierarchical levels of the system architecture. This methodology is not advisable for Markov Chains as the systems tends to grow quickly with their dependencies among various components within a system, as the number of components increases.
Component error behaviour
The modeller will have the flexibility of analysing the possible error behaviour that may correspond to individual components of a system. This also provides an insight into the component internal failures and the divergent factors that may result in failure mode, in turn having an impact on other components. The case study in this paper specifies that there might be multiple failure modes like Failure and Failed. In Failed mode the entire component is assumed to be redundant while in the Failure the component is working but having erroneous outputs/ output states, as shown in Figure 5.
The failure modes are represented using the error states with more likely coupled error behaviour of the sub-system/component. The consistency checker associated with the Error Model Annex abstracts the propagation specification to introduce unique and distinctive error types. While the modeling tool associated with the Error Model Annex validates for the organization of the component error behaviour along with the propagation specification specific to each of the component in the system architecture. The actual system architecture must include the safety system component/s that regulates the fault management and aids in safety analyses (Figure 6).
Safety [14] Analyses involves various analytical processes such as Consistency checks, Fault tree analysis (FTA), Failure modes and effect analysis (FMEA), Functional hazard assessment (FHA), and Common mode assessment (CMA) of the architectural model. The architecture model and its associated fault model is designed and developed in Open Source AADL Tool Environment (OSATE) [15]. It is an Eclipse based AADL modeling framework. There is also need the safety analysis tool such as OpenFTA [14] an Open-Source tool for FTA is integrated into Eclipse environment, to assist in generation of FTA and its relevant documents. While CMA, FMEA, FHA reports are generated as a builtin feature from OSATE.
Consistency checks
The consistency checks at the system integration level checks for the consistency in their functionality and the interfaces between various models/components, as shown in Table 1. This thereby strengthens the Virtual integration and analysis of the architecture model of the system. The consistency of various models deals with their integration feasibility while the consistency of the internal components in a model concentrates on the propagation capabilities, redundancies etc. With Error Model Annex the concept of consistency across the error models as specified checks for the consistency with respect to the component error behaviour along with the composite error behaviour of the system. It helps in defining the correctness of the error state as per the components specified in the architectural model. This may be proven with the substantial inclusion of Behaviour Annexes (B-Annex) [16] along with the Error Model Annex. The consistency report generated by the OSATE plugin for the case study is as shown below:
Consistency Report |
---|
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed |
Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance has consistent probability values for state Operational |
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed |
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed |
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed |
Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance has consistent probability values for state Failed |
Table 1: Consistency report.
Consistency report
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed.
Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance has consistent probability values for state Operational.
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed.
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed.
Warning! Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance does not define occurrence for and state Failed.
Complete_PBA_speed_control_ab_Instance:C13: component Complete_PBA_speed_control_ab_Instance has has consistent probability values for state Failed.
Fault tree analysis (FTA)
A widely used safety and reliability analysis [17] feature in aerospace, medical electronics and industrial automation industries [18]. In this analysis the major focus is on the top level event (Minimal Cut-Set), from a set of combinations of basic events (Faults). It provides a hierarchical representation of the errors of the system (toplevel event) from the basic events, related to components as specified in component error behaviour, in the form of a tree. OSATE depicts this composite error behaviour of the system from the underlying component error behaviours as a fault tree that represents specific error state of the system. This is achieved in the form of two files from OSATE for the representation of the fault tree, one being the database of primary events (.ped), as shown in Figure 4, causing the top-level error event and the fault tree analysis file (.fta). These files are viewed using Open FTA, as shown in Figure 7.
The FTA analysis is in conformance with MIL-STD882 standard and the generated fault tree is validated, as shown in Figure 8.
The artifacts related to FTA as specified by MIL-STD882 deals with error composites and error events. FTA is a top-down approach of analysis [19-21]. The Minimal Cut Set is evaluated in the OpenFTA tool and is as shown below in Figure 9.
Failure modes and effects analysis (FMEA) and functional hazard assessment (FHA)
Analysis of the failure modes associated with the system and the determination of its effects over the hierarchical evolution, performed systematically with a bottom-up approach is FMEA. With respect to the errors of the system, FMEA provides the information about the deficient component/models and their related effects. It also provides sufficient overview of the failing component such its phase of failure, severity/impact, etc. FMEA is based on the artifacts that include error propagation paths (error source, error path and the error sink). FHA provides with the possible list of error upon the synthesis of the architectural model of the system. The major artifacts from FHA comprise of the source of the error and the error events, as shown in Figure 10. The details of FHA are processed from the OSATE tool after the model is instantiated and the relevant error information is suitably extracted from these architecture models. The report will be in the form an excel spreadsheet with the specification of the error event details.
In this paper, we have proposed a novel approach of safety analyses of Safety-Critical Systems using AADL and the related Error Model Annexes. In spite of the comprehensive activities involved in safety analyses, the needs for such approaches are proved to be very much necessary. This is achieved and projected with the implementation of a suitable case study, Speed-Control Unit of Power-Boat Autopilot. The employment of analysis techniques such as Fault-Tree Analysis (FTA), Functional Hazard Analysis (FHA) and Consistency of the model along with the conduction of qualitative and quantitative reliability analyses as part of these techniques can assess the system hazards and faults. The assessment covers the generation of suitable reports justifying the analyses. These methodologies or techniques provides grant for early identification and probability of the occurrence of potential problems. This also provides a perspective to explore additional architectural properties. Re-use and analysis of the evolved models, provided with suitable extensions with limited effort can be achieved with this approach. The overall effect induces a greater confidence over abstracted stages of development and safety analyses of these architectural models of the system. Also analysing the system based on the Safety-Critical Requirements, with the expectation of exceptional conditions, hazards expedites in the development of Safety System architecture models which will have an impact in certifying the same. This also avoids the unnecessary certification costs by understanding the change impact or the exceptional causes that impact during system engineering.
Our thanks to the Director, CSIR-NAL, Bengaluru, for supporting this work.