My journey into OT security has led me to the question of can we safely operate automated security orchestration platforms in an OT environment or a converged OT/IT environment? I mean many OT systems are already highly automated operations. What are the ramifications of providing security responses in a highly automated manner? I learned that there are many considerations and conditions that must be accounted for before taking the plunge into highly automated security operations. The first condition is to move to a software-centric approach which is underway in most respects as IIoT/IoT takes hold into these environments. But what else is needed to move OT security into the 21st century? Take a look at the article below and give me your thoughts.

Up to recently, the Operational Technology (OT) world was primarily built on proprietary architectures with a significant focus on reliability, safety and longevity, albeit at the cost of flexibility and scalability. OT solutions were typically custom-built, hardware-centric and leveraged heterogeneous compute architectures. These proprietary solutions use a variety of network protocols and obey different command syntax that are tightly bound to their controllers, which in turn are tightly bound to particular supervisory software. That’s why mixing and matching solutions at different levels of the control hierarchy is very difficult, and why what should otherwise be a simple PLC switch-out, for example, often requires replacing an entire skid, resulting in expensive downtime, lost production and lost revenue.

Most OT environments have a mish-mash of legacy apps that require a lot of effort to coordinate and really don’t interoperate at cyber relevant speeds. These legacy OT systems haven’t kept up with the march of technology and are still using design concepts from 15-20 years ago. Many are designed to be deployed as isolated networks, using the air gap approach and the Purdue model, making them difficult to scale, reconfigure or extend whenever supply chains are updated. While the systems are in good working condition, because they are monolithic and inflexible, they are also slow to respond to changes in technology, such as IoT/IIoT and new protocols such as 5G. The industry is now realizing that this monolithic, isolated approach, while it delivers on safety, reliability and longevity, has also resulted in a massive technology debt.

Trends in technology are driving the convergence of IT and OT, where IT systems increasingly “consume” data from OT systems to improve performance analytics, reduce maintenance time, and help predict customer needs through the application of AI and machine learning. The demand for minimizng outages and downtime has also introduced the need from OT system vendors to remotely monitor their equipment, thereby increasingly compromising the air gap model which was the cornerstone of security for these OT systems when they were originally designed and implemented. These market forces, along with increased concerns over security, are pushing operators to realize that they can no longer compromise for the sake of reliability, safety and longevity. To move forward, OT operators are looking to adopt a software-driven approach. Organizations such as Open Process Automation Forum (OPAF) have come together to accelerate the shift from a hardware-centric approach to a software-driven approach. This new approach provides flexibility from the get-go enabling key capabilities such as software updates, redundancy, vendor interoperability and transition from a device-centric view to a systems and solutions view. A software-driven approach can break down proprietary barriers by deploying software-programmable “white label” devices, particularly switches, which allow for on-the-fly swap outs and topology changes under remote—and potentially centralized—software control.

The new software-driven approach is also impacting embedded system developers of OT environments, bringing relevant cloud-native technologies to the OT environment. For example, OT developers are increasingly able to leverage the use of containers in edge applications like IoT gateways, industrial control systems, on-premise data lakes, deep learning-based security, autonomous driving systems, Radio Access Network (RAN) products, and a wide range of network appliances. Cloud-native architectures are particularly critical as the OT world transitions to flexible systems and solutions approaches. However, extending the use of container technologies to the OT domain requires a different way of thinking. Compared to IT compute nodes, where the variance between compute nodes is relatively minor, the compute environments in the OT world vary significantly. This is typically because of long deployment lifecycles that can result in new generation hardware sitting right next to one that is 10+ years old.

Because of these challenges, most embedded systems have either continued to be implemented using traditional “bare metal” physical architectures or have adopted virtualization approaches based on VMs that are likely sub-optimal in terms of agility, portability, footprint and/or load time. However, over the last few years the developer community is adding capabilities that are needed and relevant for OT systems. For example, today’s commercial Linux offerings now provide the necessary applications, tools, documentation and other resources for embedded system developers looking at leveraging or deploying systems using a cloud-native model, as well as pre-integrated components from CNCF, configured to deliver a fully-functional solution for embedded systems such as edge appliances.

Embedded developers can also take advantage of the Manufacturer Usage Description (MUD) – an embedded software standard defined by the IETF that allows IoT Device makers to advertise device specifications, including the intended communication patterns for their device when it connects to the network. The network can then use this intent to author a context-specific access policy, so the device functions only within those parameters. In this manner, MUD becomes the authoritative identifier and enforcer of policy for devices on the network.

For vehicle manufacturers, the transition to a software-driven approach has been going since 2008 as embodied in AUTOSAR – (AUTomotive Open System ARchitecture) – a worldwide development partnership of over 200 vehicle manufacturers, suppliers, service providers and companies from the automotive electronics, semiconductor and software industry. This worldwide development partnership is focused on the joint development and establishment of open industry standards for automotive E/E software architecture. Goals include the scalability to different vehicle and platform variants, transferability of software, the consideration of availability and safety requirements, a collaboration between various partners, sustainable use of natural resources, and maintainability during the whole product lifecycle. The newest specification – Adaptive Platform – is intended to support Car-2-X applications which require interaction to vehicles and off-board systems. That means that the system has to provide secure on-board communication, support of cross-domain computing platforms, smartphone integration, integration of non-AUTOSAR systems, and so on. Also, cloud-based services will require dedicated means for security, such as secure cloud interaction and emergency vehicle preemption. They will enable remote and distributed services, such as remote diagnostics, over-the-air (OTA) update, repair, and exchange handling.

Legacy OT environments defined by proprietary, air-gapped islands of technologies have left operators with few options when it comes to implementing security policies that can cover these new operations and technology trends. The legacy apps are growing increasingly difficult to secure due to their complex interconnections, the variety of stakeholders involved, and the wide range of assets they control. In such legacy environments, there is often no centralized configuration capability, little to no auditing of configurations and limited security controls at the switch itself. If you want to inspect or change port settings, you have to physically visit a switch. This often leads to inefficient performance or simply wrong network configurations. On a security basis, this also means operators cannot see patterns of activity across the network that could pose potential threats. As a result, they are unable to respond to those threats swiftly and with minimal disruption. In addition to these realities, OT network managers of critical infrastructure can’t simply shut down part of their network when threats occur or when having to patch a system for vulnerabilities.

As a starting point to a new converged IT/OT security reality, there is a fundamental need to intervene at the network level, and particularly, to enforce network segmentation to better control access to critical OT systems. Enterprises are increasingly adopting network segmentation projects to protect their OT environments. As remote access connections explode in the post-COVID era, zero trust network access and other cloud-based scalable solutions may outpace traditional VPNs and firewalls as the technology of choice for network segmentation. Orchestration of control-plane functions as well as the secure distribution of security-related information can be based on zero trust networks that split the control layer from the data layer like a software-defined network. Some zero trust network implementations also leverage cryptographic protocols to cloak and segment the network – making it invisible to attackers, and enabling secure enclaves. By separating the control and data layers in these manners, you can define security zones, event thresholds and other control features in software rather than in hardware, allowing OT to achieve the same flexible, policy-driven distributed network management as IT while overcoming OT’s inherent security and management challenges. Some examples of zero trust architectures and remote access solutions for OT environments include Tempered Networks’ Airwall, and a variety of offerings based on the Software Defined Perimeter specification from CSA by companies such as Waverley Labs, Akamai, Appgate, and Perimeter 81, Onclave, and others.

Essentially, as more IoT / IIoT capabilities come on-line, the need to support the Internet of Things and the expected M2M traffic explosion will require smart control of network and security resources on a service-by-service basis. Some aspects of this smart network will be provided by 5G network slicing which enables the multiplexing of virtualized and independent logical networks on the same physical network infrastructure. Each network slice is an isolated end-to-end network tailored to fulfil diverse requirements requested by a particular application. 5G services will be implemented as Edge Computing on cloud-based hardware in Edge Data Centers (EDCs). In this context, 5G will become a major demand driver for distributed data centers since 5G propagation is reduced compared to 4G LTE and will require many more antennas and more EDCs to support the growth and scale of 5G deployments. This important change means that many EDCs will increase data vulnerability due to the larger number of access points and risk vectors. It is imperative that security be integrated into the design of the EDC to control unnecessary exposure to risk. Given the assumption of principally unmanned data centers supporting the scale of the edge, it will be important for the design of these facilities to consider security automation and orchestration as an essential approach to control. 5G also requires a shared security responsibility, much like that in the public cloud. Every organization must keep that in mind.

As industrial companies increasingly integrate IIoT / IoT capabilities to expand their businesses, it’s also becoming apparent that OT infrastructure managers now require the security orchestration that their IT counterparts have been using. Security Orchestration Automation and Response (SOAR) tools built specially for the OT environment should include such critical system requirements:

  • Network-wide situational visibility. This capability identifies what nodes (devices and ports) are talking to what other nodes, what protocols are in use, what devices are present on the network and where, traffic volumes between nodes, etc. It is important that visibility can be delivered using passive means in an OT environment.
  • Anomaly identification. This feature shows how the current situation differs from historical norms or from what’s expected (for instance, traffic spikes on a port talking to the Internet during non-working hours).
  • OT Policy / playbook orchestration manager. This involves developing and running playbooks to set alerts or to take other actions automatically when pre-defined anomalies occur. It also addresses the reaction to alerts as well as proactive steps such as isolating subnets from the rest of the network, disconnecting a device, or shutting down a subnet as a last resort.
  • Single pane of glass. This term refers to the ability of operators to do all the tasks listed above without having to switch between interfaces, learning device-specific commands, or reading equipment vendors’ technical manuals.

Proxies that support fine grained role-based access control (RBAC) and command whitelisting, one way data diodes, and distributed gateway solutions that provide multi-protocol translation are also needed to enable controlled interfaces between the IT and OT worlds. These gateways and proxies (such as NGINX) can also merge control planes for legacy (brown field) and new microservices based applications (green field) helping to accelerate the modernization of the OT environment. XDR solutions are also making their way into converged IT/OT environments as cross-layered detection and response is powered by AI/ML. XDR tools collect and automatically correlate data across multiple security layers – email, endpoint, server, cloud workloads, and network – so threats can be detected faster, while enabling security analysts and SOAR tools to improve investigation and response times. NIST has published the NISTIR 8259A – IoT Device Cybersecurity Capability Core Baseline – a set of device capabilities generally needed to support common cybersecurity controls that protect an organization’s IoT/IIoT devices as well as OT device data, systems, and ecosystems.

However, even some of these devices may have security issues and be susceptible to attacks, as recently revealed through research by Trend Micro which discovered flaws in protocol translation gateways. In order to link OT assets such as programmable logic controllers (PLCs) to Ethernet, Wi-Fi and mobile networks, the industry uses devices known as protocol gateways or protocol translators that receive encapsulated packets over one protocol and translate them to a different protocol or between different physical layers of the same protocol, for example Modbus TCP (Ethernet) to Modbus RTU (serial). If protocol gateways fail, then the communication between the control systems and machinery would stop and operators would lose visibility over the system, making them unable to tell if machines or generators are running properly. Translation failure can also prevent the operator from issuing commands to troubleshoot problems.

For their research, the Trend Micro researchers focused on gateways that translate between different versions of Modbus, because Modbus is one of the most widely used protocols on OT networks. Protocol gateways that translate between completely different protocols were left as a target for future research. According to the researchers paper, with a single command, the attacker can deactivate the critical sensors for monitoring a motor’s performance and safety (temperature and tachometer), while keeping the motor running. If unnoticed by field engineers and operators, the motor could already be exceeding the safe operating conditions, however, it won’t be visible or trigger any alarms because the sensors (thermometer and tachometer) have been disabled. The researchers pointed out I/O mapping tables as a crucial source of information for attackers the attack development and tuning phase and may provide the key piece of information an attacker is looking for to bring the facility down. In addition, any unauthorized modification to the I/O mapping table will tamper with the operation of the HMI, PLCs, and devices connected to the data station.

One of the questions facing security managers of converged IT/OT environments is how to deploy a security orchestration capability – centralized, distributed, or hierarchical.  And how do you consider cross-impacts on operations – i.e., how does an automated security response affect continuity of operations between both environments? Emergent / divergent behaviors can become a byproduct of automated security orchestration services causing disruption to the OT environment. Cybersecurity devices that filter unauthorized network traffic could cripple an OT network simply by preventing important data from reaching its destination, and, in some cases, can cause failures just by delaying data. Software used to scan a network for vulnerabilities may send unfamiliar messages to OT devices and can cause them to fail, which is especially bad if the device is actively controlling a process. Even something as simple as antivirus software can have a detrimental impact on the performance of these critical systems and may be impossible to implement altogether. Remediation actions may have to be delayed due to operational needs, thus obviating the need for automated response. The complexity associated with the sensing and control loop(s) that are central to OT systems must be well addressed in any design. This complexity must be accommodated by any playbook and framework of tools used for security orchestration.

One approach for deploying SOAR tools into a converged environment involves redundant SOAR tools – one placed in the IT environment and another for the OT environment. Each tool would have a playbook customized to the environment to which it serves. In this way, the OT environment SOAR capability could be served from the edge computing centers while the IT environment could be serviced centrally from a cloud-based source. Any AI / ML algorithms for OT security could be pushed down to the edge data centers to support the SOAR tool. The IT and OT environment security responses may require some level of synchronization and pass-thru of tasks and results via proxies or gateways between the environments.

MOSAICS is an example of a government project co-led by Sandia Labs and the Navy that is examining the unique requirements for security orchestration for OT environments. I will have more on MOSAICS in an upcoming interview so stay tuned. Also, at NIST, researcher Timothy Zimmerman is leading a group that is working to produce guidelines, test methods, metrics and tools based on measurement science and standards to give industry the confidence it needs to effectively apply cybersecurity protections on their OT systems without negatively affecting their performance, safety or reliability. Their work has already resulted in a manufacturing profile using the Cybersecurity Framework, which outlines a risk-based approach to help manufacturers implement, manage and improve their cybersecurity posture using industry standards and best practices. And to put the manufacturing profile to the test, NIST will be using it to protect their own robotic and process control testbed under many different configurations and scenarios while measuring the performance impacts to the system.

And thanks to my subscribers and visitors to my site for checking out! Please give us your feedback because we’d love to know some topics you’d like to hear about in the area of active cyber defenses, PQ cryptography, risk assessment and modeling, autonomous security, digital forensics, securing OT / IIoT and IoT systems, Augmented Reality, or other emerging technology topics. Also, email if you’re interested in interviewing or advertising with us at Active Cyber™.