This election year spurred me into researching the cyber resilience of OT / IoT systems, of which electronic election systems are a subset. I discovered there was quite a bit of synergy between cyber resilience and mod-sim. As I explored the relationship I became convinced that any OT or IoT system of consequence could significantly improve its resilience posture with a modest investment in mod-sim. And the cost of producing a model and / or simulation of any significance is coming down due to a variety of tools, standards, and freely available resources as I discuss below. So explore this article yourself – I think you will agree with me that mod-sim is a good investment if you are designing and building OT / IoT systems that will need to scale. Let me know if you have other mod-sim tools for OT/IoT systems that you feel are noteworthy as well.
The adoption of digital prototypes, simulated cyber-physical control systems, and AI/ML analytics at the edge by a wider user base of engineers are of paramount importance for sustaining innovation and improving the resilience of new Operational Technology (OT) and IoT systems. New support structures and development support are arriving to help including new standards, new tooling, and new methods. At the forefront of these changes is the increasing adoption of Model-Based Systems Engineering (MBSE) and advanced modeling tools for simulating system features, designing algorithms, assessing life cycle changes, and addressing the increased complexity of systems engineering. In conjunction with MBSE, integrating the physical and virtual worlds of OT and IoT systems through the use of digital twins helps engineers improve system analysis, better predict failures or downtime, and provide for more accurate maintenance schedules – all elements that help to improve cyber resiliency.
And improving cyber resiliency is becoming a critical priority as security, safety, and performance design trade-offs of modern OT and IoT systems are becoming increasingly complex rendering traditional engineering methods insufficient for their successful realization. The systems have become more complex, due to many factors, to name a few:
- Increased spectrum of technologies: complex systems have become cyber-physical systems (CPS) and now depend upon the seamless integration of computational algorithms and various physical components,
- Increased customer demands for more sophisticated systems and market or military competition,
- Systems consist of a large number of components interacting in a network structure and usually these components are physically and functionally heterogeneous.
A MBSE approach has been used to describe a prototype system, its use cases, alternative scenarios, and to identify any potential risks resulting from safety and security issues. These have then been further explored using simulation, resulting in evolution of the model and then finally the model is used to generate the required system specification or digital twin. As modeling technology has matured, modeling is providing even better economics by accelerating learning (simulation) and providing better insights into the physical world (digital twins). Both simulation and digital twins are important to design, develop, and evaluate systems. Although models are not a perfect representation of a system, they provide knowledge and feedback sooner and more cost-effectively than implementation alone. And they allow simulation of complex system and system-of-systems interactions with appropriate fidelity to accelerate learning. In practice, engineers use models to gain knowledge and to serve as a guide for system implementation, testing, and evaluation.
Applying the Unified Architecture Framework (UAF), previously known as Unified profile for MODAF and DoDAF (UPDM), using an MBSE approach moves the architecture modeling effort to one that is an integral part of systems engineering (SE), helping the systems integrator to develop interoperable systems, with traceability to requirements and across views and domains, using one integrated architecture model that enables impact analysis, gap analysis, trade studies, simulations, and engineering analysis. Moreover, the scope of UAF is expanded beyond defense architectures. It is genericized to be applicable to architecting system of systems of any domain.
MBSE provides a number of ways on how to create, validate, and verify complex system design; unfortunately, the key advantages of MBSE (such as managed complexity, reduced risk and cost, and improved communication across a multidisciplinary team) have not been exploited enough from a security perspective. For example, certain system properties, such as cost, schedule, and performance, can constrain a system’s ability to maintain the intended security posture. Without a way to integrate security properties with these system engineering properties, it is difficult to objectively compare solutions and highlight trade-offs between security objectives and other system requirements.
There are, however, new MBSE-based security requirements engineering processes and modeling methods, tools, and standards emerging, as well as a MBSE security profile, which is formalized with the UML 2.5 profiling capability. The new UML-based security profile conforms to the ISO/IEC 27001 information security standard. This security profile helps security engineers and systems engineers to work together using a joint design process or framework to define security aspects in a common model along with providing useful techniques such as model validation (e.g., checking if the current level of risk is acceptable) and change impact analysis (e.g., check what assets will be impacted if the security requirement is changed). Coverage analysis (e.g., check how many risks are not linked with security controls) and model simulation (e.g., check if attack scenario is executed correctly) are also useful techniques. The use of model-based techniques ensures that the security and system artifacts are aligned at the early phase of system design and MBSE benefits are extended to the security engineering domain. Simulation is also supported through the Action Language for Foundational UML or ALF. Alf is the OMG-standard textual language specifically designed for specifying executable behavior in the context of graphical, executable UML and SysML models. It is the best alternative to using complicated activity diagrams or scripting languages not designed for use in models.
One example of a MBSE-based security engineering tool is Model-Based Risk Assessment (MBRA) from KDM Analytics. MBRA is a risk assessment tool that emphasizes the use of rigorous MBSE-aligned models, analytics, and best practices for repeatable assessments of the cybersecurity of systems. MBRA is also aligned with the NIST Risk Management Framework workflow and the NIST Cybersecurity Framework. MBRA leverages the Unified Architecture Framework (UAF) and SysML to identify, analyze, classify and understand cybersecurity threats and related risks.
MBRA also uses TOIF – the Tools Output Integration Framework (TOIF) XMI schema – which is the common reporting format of source/machine code weaknesses. TOIF XMI is the core part of a protocol that integrates weakness findings, from multiple static code analysis tools, related to a single system under assessment. The TOIF defines a common format for normalizing vulnerability reporting protocols with the following key goals:
- Creating bases for composite vulnerability analysis tools on top of existing off-the-shelf vulnerability detection tools
- Improving the breadth and accuracy of vulnerability analysis
- Improving the rigor of assessments by bringing vulnerability detection into architecture context
TOIF helps to create internally consistent models that not only support concept exploration and analysis but also conform to modeling guides, ontologies, and constraints levied by middleware and the need for data interchange.
MBSE tools also have plugins for Safety and Reliability. For example, the Cameo Safety and Reliability Analyzer Plugin enables a model-based approach to safety and reliability analysis. This functionality integrates into the No Magic MBSE toolkit. The plugin supports:
- The failure mode, effects, and criticality analysis (FMECA) according to IEC 60812:2006 standard
ISO 26262 (Road vehicles – Functional safety), - Hazard analysis according to the following medical standards: IEC 62304, ISO 14971:2007, Corrected version 2007-10-01 (Medical devices – Application of risk management to medical devices).
The Cameo Safety and Reliability Analyzer can demonstrate that risks are addressed by safety requirements/risk control measures, design elements, critical quality attributes (CQA), and automatically validate the model to ensure that the entire design went through safety and reliability analysis.
While the benefits of MBSE and SysML to tackle complexity are well established, until recently it wasn’t possible to harness this power to design larger and more complex Industrial IoT applications. A new MagicDraw plug-in for RTI Connext provides a robust way to connect applications running across different computers, especially when the security and quality of service of individual data flows matter. Based on a new profile for SysML, the plug-in can generate the artifacts that configure the DDS databus (Topics, Data Types, Qos, etc.) and use the Connext SDK to generate the adapters to native code (e.g. C++ or Java).
There is also SysML-Sec. SysML-Sec is an environment to design safe and secure embedded systems with an extended version of the SysML language. SysML-Sec targets both the software and hardware components of these systems. SysML-Sec is fully supported by the free and open-source toolkit TTool, and includes the following methodology stages:
- Requirement captures, with specific stereotypes for security-related requirements (e.g., confidentiality, authenticity, integrity, etc.)
- Attack graphs, that are an enriched version of attack trees. Moreover, attack graphs are formally defined, and can therefore be studied against some properties, e.g., reachability of a given attack.
- Security-aware system architecture definition and exploration, that is, defining the functions, the hardware architectures (CPUs, buses, etc.), and the mapping of functions – and their communications – over the hardware nodes. During that phase, the impact of security mechanisms over the safety and performance of the overall system can be studied, e.g., the additional latencies induced by security mechanisms. Safety and performance properties can be verified with the TTool’s built-in model-checker. That model checker takes into acocunt eh characteristics of hardware nodes. For example, in the case of a CPU, it takes into account its pipeline size, its cache meory, etc.
- Design of software components including the ones related to safety and security. From the design diagrams – built upon SysML block and state machine diagrams -, safety and security proofs can be performed. Those proofs rely on external toolkits: UPPAAL for safety proofs and ProVerif for security proofs.
- Prototyping of sofware components: executable code can be generated from design diagrams. TTool offers a specific support to facilitate the execution of that code in a virtual prototyping environment (SocLib) so to experiment the software components in a more realistic environment than the PC running TTool.
New tools are also being introduced to help scale simulation and enable the “democratization of simulation.” A new web portal called Rev-Sim.org was launched, sponsored by OnScale – a cloud-based, pay-as-you-simulate offering. The web portal is intended to support a growing industry-wide movement to make engineering simulation more accessible, efficient, and reliable; not just for CAE experts but also for non-specialists – to accelerate innovation. OnScale is providing their expertise in CAE and funding to support the initiative. “Extending simulation across the entire product development team is critical for today’s compressed product cycles,” said Rev-Sim.Org Co-Founding Principal, Malcolm Panthaki. The Rev-Sim.org website provides access to the latest success stories, news, articles, whitepapers, thought leadership blogs, presentations, videos, webinars, best practices and other reference materials to help industry democratize the power of simulation across their engineering, manufacturing, service, supply chain, and R&D organizations.
Another simulation tool developed by Sandia National Labs is called SCEPTRE. SCEPTRE is an application that uses an underlying network emulation and analytics platform (Emulytics™) to model, simulate, emulate, test, and validate control system security and process simulations. Traditionally, tools and techniques for simulating and emulating control system field devices have been limited because the physical processes being controlled are omitted. SCEPTRE leverages proven technologies and techniques to integrate the device and process simulations, with control hardware in the loop, providing an integrated system capable of representing realistic responses in a physical process as events occur in the control system and vice versa. SCEPTRE is a proven control system environment platform, having been fielded for many R&D applications, operational joint tests, and exercises supporting testing, training, validation, and mission rehearsal.
SCEPTRE is comprised of simulated control system devices, such as remote terminal units (RTUs), programmable logic controllers (PLCs), protection relays, and simulated processes, such as electric power transmission systems, refinery processes, and pipelines. The simulated control system devices are capable of communicating over Internet Protocol (IP) networks using standard Industrial Control Systems (ICS) protocols such as Modbus, DNP3, IEC 61850, and others.
All of the tools and methods discussed so far will improve the design, building and operation of OT/IoT cyber resilient systems. But how resilient will they be? How do you measure resilience? According to this Mitre FAQ, cyber resiliency is defined as “the ability to anticipate, withstand, recover from, and adapt to adverse conditions, stresses, attacks, or compromises on cyber resources.” The FAQ also includes a definition for critical infrastructure resilience – ““…the ability to prepare for and adapt to changing conditions and withstand and recover rapidly from disruptions. Resilience includes the ability to withstand and recover from deliberate attacks, accidents, or naturally occurring threats or incidents.” [WH 2013].
Alexander Kott, chief scientist of the ARL and Army ST—senior research scientist—for cyber resilience points out in an article in the August edition of Signal Magazine that cyber resilience is especially important for the Army as the service relies on cyber operations to an increasing degree. “We need to learn how to measure cyber resilience,” Kott states. He adds that no engineering discipline ever has achieved any degree of maturity and sophistication without determining how to measure its properties. He declares, “That means we need rigorous tools that can measure cyber resilience. Only then can we actually improve our cyber resilience.”
It seems to me that a model and/or simulation can be used to help measure cyber resilience. My point is that any complex problem can be parsed into sub-models, which can clarify and ultimately improve measurement. Modeling a cyber scenario involves defining the elements that contribute to a potential loss event — the asset(s) at risk, the threat(s) to those assets, the protective, containment, mitigation, and recovery controls that are relevant to the scenario, and the forms of loss that could materialize. Threat agile modeling is a form of cyber modeling that can contribute to the measurement of cyber resilience. Publicly released on GitHub and Docker in August 2020, Threagile models its architecture and assets as a YAML file directly inside the integrated development environment (IDE). When the toolkit is executed, 40 built-in risk rules – and any custom rules created – are checked against the architecture model.
The ability of a design to be resilient to a variety of classes of cyberattack is the focus of a suite of tools provided by Systems and Technology Research (STR). STR’s Security Policy and Resiliency Tools and ANalysis (SPARTAN) tool chain leverages standard Systems Engineering artifacts to expose and prioritize the protections that must be added to the system, from the earliest phases of design, to be resilient to cyberattack. The tool chain can process system architecture models represented in either the Architecture Analysis and Design Language (AADL) or the Systems Modeling Language (SysML) to reason about which data transfers between system component are allowed, under what conditions, and which connections are not. Thus, SPARTAN builds a mathematical representation of the complete space of Cyber Requirements (CRs) from the modeling artifacts, which enumerate all data transfers between components that shall not occur in the design. Failure of the system to obey a CR, which will allow unintended accesses of system components to occur, represents a fault. A resilient system needs to display fault tolerance, the ability to retain critical mission functionality in the presence of faults. To reason about resiliency, SPARTAN leverages the modeling capabilities of SysML to capture the system level failures that may be triggered by component level faults. The resulting Cyber Resiliency Requirements (CRRs) are then assessed for impact on system failure risk using agent-based modeling and Monte Carlo simulations.
As computing moves to the edge and into distributed IoT devices, cyber resilience takes on new dimensions outside of the traditional data center. The edge data center can be very small and have power limitations. There can be severe environmental conditions so systems must be adaptable to the location (temperature, vibration, etc.). Many edge data centers supporting IoT will be unstaffed – no human on site – and therefore require remote support. Mesh networking will be used to connect edge data centers to users and central management, facilitate M2M connectivity, and enable other stakeholders. The ability of a network to withstand outages and reroute traffic, while meeting latency and jitter requirements are key properties that need to be measured for OT network resiliency. Measures of the resiliency of these properties must enter into the cyber resilience equation.
Testing of component models helps improve resiliency and stability of code changes. Applying Test-Driven Development (TDD) practices help teams build quality into their systems early, facilitating the continuous small changes found in Agile software development. TDD relies on writing the tests first: tests should be written before the functionality that is to be tested. Test-first creates a rich suite of cases that allow developers to more reliably make changes without causing errors elsewhere in the system. Rich, automated tests are critical to creating a Continuous Delivery Pipeline. Lean practices encourage testable, executable models (when feasible) to reduce the waste associated with downstream errors. Resilience properties can therefore be added as part of the test case and early in the design and coding process. Component models can therefore be testable against whatever resilience assessment criteria exist for the domain or discipline. for example, for cyber-physical systems – Mechanical models test for physical and environmental resilience issues, electrical models test for logic, software models test for anomalies, executable system models test for system behavior. Most tools provide the ability to check models or to create scripts that can iterate across the models and identify anomalies. Also, applying continuous integration helps by providing revertible checkpoints, thereby increasing the resilience of the system.
Effective cyber resilient applications require speed in operation, which means that time to detection and time to containment to cyber attacks or system failures could be appropriate measures of cyber resilience. The speed necessary for these actions to be effectively resilient for OT systems may often require the absence of human involvement to meet the time scales required. A DoD/DOE program called MOSAICS is working on key aspects of resilient OT systems while integrating government and commercial technologies to meet the need for speed for cyber response of OT systems. MOSAICS is in the process of transitioning the integrated cyber response capabilities to industry.
Robust, open APIs remain the single most important criteria for current and future integration to gain speed and scale for OT systems. Virtualization is also a key technology for gaining speed and scale, as we are starting to see virtual PLCs and analytics occuring at the edge node that synchronize with a local controller to enable “hot swap” of control functions between the edge node and the local controller. This type of synchronization requires deterministic IP forwarding for the process automation network, and effectively converges the OT and IT networks. Sharing of indicators of compromise, attack behaviors, and mitigation playbooks can also improve speed and scale of cyber resilient operations. Cyber resilient OT applications should also provide transparency about what is under the hood / what software you have running on the network and devices – down to component levels. Standardizing such a software bill of materials (SBOM) is being undertaken by a few groups including NTIA.
Can digital twins also be employed to measure the impact of cyber-attacks on critical SCADA operations? Can digital twins help us to evaluate the complex interactions between cyber-attacks, defensive actions, and Cyber-Physical Systems? Can we use digital twins to train operators to prevent or contain cyber attacks on critical infrastructure? Network digital twins offer some help to satisfy these needs through safe, scalable and cost-effective solutions.
Digital twins can be used to bridge the gap between physical and digital domains. A digital twin is a virtual instance of a physical system synchronized through the physical twin’s operational data [i.e., as supplied by IoT devices] such as performance, maintenance, and health. This cross-domain capability of digital twins enables a deeper understanding of the impact of design choices throughout the product lifecycle. Digital twins support business agility by better predicting when future enhancements and product upgrades will be necessary to make product roadmaps more accurate. And they can uncover new business opportunities by enabling deeper analysis and elaboration on requirements, architecture trade-offs with parametric evaluations, and better design documentation. They are also used in operational settings to enable predictive maintenance of systems.
However, there are several challenges to the employment of digital twins:
- Digital twins lack standardization, definitions and common language,
- Digital twins can be difficult to apply across the product lifecycle,
- Often there are multiple digital twins, versions or views that don’t interoperate,
- Digital twin technology often needs to fit within a legacy environment.
These challenges are being addressed through the OMG-based Digital Twin Consortium™. The consortium coalesces industry, government and academia to drive consistency in vocabulary, architecture, security and interoperability of digital twin technology. It advances the use of digital twin technology from aerospace to natural resources. Its global membership is committed to using digital twins throughout their operations and supply chains and capturing best practices and standards requirements for themselves and their clients.
Another tool, called Syndeia from Intercax, helps to address the digital twin interoperability issue by providing a platform to interconnect models / digital twins in Cameo System Modeler and tools for requirements, design, simulation and project management. It enables engineering teams to collaboratively and concurrently develop and manage a digital thread for any complex system/product or set of digital twins by federating models and data from diverse ecosystems of modeling and simulation tools, enterprise applications, and data repositories. This diverse set of tools and repositories include, for example, SysML modeling tools (e.g. MagicDraw, Rhapsody), PLM systems (e.g. Teamcenter, Windchill), CAD systems (e.g. NX, Creo), ALM systems (e.g. GitHub, JIRA), Project Management Systems (e.g. JIRA), Requirements Management Systems (e.g. Jama, DOORS-NG), Simulation tools (e.g. MATLAB/Simulink), Databases (e.g. MySQL), and other data sources (e.g. Excel). Syndeia provides a rich set of services for building, managing, analyzing, querying, and visualizing the digital thread of the product/system through its lifecycle. Syndeia builds on a variety of open standards (e.g. REST/HTTP, JDBC, JSON, STEP, OSLC, and FMI), open-source projects and libraries, and production-ready APIs.
New tools are also becoming available to build digital twins, supply sensor data via IoT devices, and support IoT applications through the Eclipse Foundation. The Foundation introduced the Eclipse IoT Packages project in 2019. Since then, the IoT Packages project has been expanded to include edge technology by adding the Eclipse ioFog project to the package. The resulting package gives developers a pre-integrated, open source, cloud-to-edge stack to address edge use cases:
- Eclipse Hono for device connectivity,
- Eclipse Ditto for managing and organizing digital representations of physical objects —digital twins,
- Eclipse hawkBit for rolling out and managing software updates to IoT devices,
- Eclipse ioFog for building and running applications at the edge at enterprise scale.
Developers can install a single, off-the-shelf package and immediately start using the technologies within it without the time and effort required to manually integrate them.
To conclude: MBSE, simulation, digital twins, test-driven development, M2M automation, improved performance and more secure gateways and APIs, threat-centric AI/ML, along with a growing list of supporting vendors and standards, are emerging to help make OT/IoT systems more cyber resilient. What approaches are you taking to make your systems more cyber resilient? What is your experience with the tools and methods described in this article?
And thanks to my subscribers and visitors to my site for checking out ActiveCyber.net! Please give us your feedback because we’d love to know some topics you’d like to hear about in the area of active cyber defenses, PQ cryptography, risk assessment and modeling, autonomous security, digital forensics, securing OT / IIoT and IoT systems, Augmented Reality, or other emerging technology topics. Also, email chrisdaly@activecyber.net if you’re interested in interviewing or advertising with us at Active Cyber™.
Hi Chris,
In may ways I see this following both my concerns and hopes of some recent work I have been doing in AI. As they advance, Generative Adversarial Networks (GAN) have the potential to solve many of security and testing issues that are difficult to test or reproduce today.
Great article. We can talk more offline.