I have been seeing quite a number of references lately to the terms “cyber ready” and “cyber readiness.” Some of these references reflect an enterprise view of readiness – defined as “…the state of being able to detect and effectively respond to computer security breaches and intrusions, malware attacks, phishing attacks, theft of data and intellectual property from both outside and inside the network.” Another reference, published by the Potomac Institute For Policy Studies, and led by Melissa Hathaway, provides a more macro view of cyber readiness with the Cyber Readiness Index (CRI) 2.0. The CRI 2.0 is a methodology to evaluate and measure a country’s preparedness levels for certain cybersecurity risks and results in a readiness score. DHS CISA has a starter kit for building a Culture of Cyber Readiness as a key step to managing risks. There are quite a few commercial guides to becoming cyber ready, and there is the Cyber Readiness Institute‘s free Cyber Readiness Program “designed to help small and medium-sized enterprises become more secure against today’s most common cyber vulnerabilities.” These different takes on cyber readiness made me think – Is a system really cyber ready if it is secure but unsafe? Like possibly an autonomous vehicle? Or is a system really cyber ready if you don’t know how secure are the supply chains of software modules? I started to come up with my own version of cyber ready – let me know what you think of it.
We live in a data-centric society where we face a fire hose of data every way we turn. This fire hose is expected to increase exponentially as digital transformation takes hold, as 5G becomes a mainstay, and as adoption builds for hyperscalers’ new dedicated products [such as AWS Wavelength or Google Distributed Cloud Environment (GDCE)] to power data-intensive 5G applications at the edge. Up to now, we have created and operated disconnected silos for handling data and performing corresponding functions, usually by leveraging highly specialized providers, procedures, tools, and personnel for each silo. Each silo often has its own set of proprietary data types, dedicated business rules for data handling, and unique protocols, taxonomies and lingo. For example, we have our security silo and our network silo – SOC versus NOC. In security we even have separate sub-silos with specializations for IT vs IoT vs OT. We have a rich variety of security data as well – threat data, compliance data, risk data, forensics data, vulnerability data and much more – with a diverse set of sources and destinations for each type of data.
In addition to security data, organizations also collect other types of data – to measure and evaluate equipment, products, personnel, business units, market segments, etc. We collect safety data, performance data, reliability data, test data, customer data, market data, efficiency data, financial data, HR data, etc., with each collection and analytic activity singularly focused on that specific data entity and mission area. This diversity and specialized focus within security and by the different silos complicates security and makes finding threats a very difficult problem in an enterprise when data possibly related to threats is collected across hundreds of sources of telemetry and different systems’ silos, and when threats often look identical to normal activity.
Each silo also continues to add dimensions / data / definitions / metrics / tests. For example, from a security silo perspective, when you want to assess your vulnerability state, you have to define what that means. Are you talking about your entire attack surface? Or just the Internet-facing assets? Are you talking about active vulnerabilities, ones under investigation, or ones that have been fixed? Or ones that have been fixed and resurfaced? Security has complications that depend a lot on use cases and definitions. What are you really trying to measure? What are you trying to optimize? How can I optimize when I am drowning in data? Just now we are beginning to create tools (i.e., AI/ML) with frameworks and approaches that can start to make better sense of this data deluge. However, applying AI/ML to security data is not an easy problem to solve due to disparate data types and definitions, the need to tag data, single-threaded algorithms, bias in algorithms, etc.
How should organizations cope with this volume, variety, and velocity of disconnected data? How can an organization gain true insight into its cyber posture and achieve a high degree of situational awareness into its network operations and asset utilization? Without a comprehensive understanding of connected information assets, deep insight into the threat landscape, and awareness of how its systems and networks operate and inter-operate, how its systems support business operations, and what information is moving in, out and through its networks, an organization cannot achieve cyber readiness.
I define cyber readiness as a measure of a system’s protection and response properties after full consideration of competing concerns – a measure commensurate with the system’s importance or criticality, and the results in a readiness rating which reflects a holistic view of all aspects of the system such as resiliency, usability, security, reliability, performance, privacy, and safety . Therefore, cyber readiness is not something that is strictly defined, measured, and reported by the security domain of an organization, but broadly encompasses all aspects of a system under management.
We have known for some time that each of these system aspects can interact with and impact another. For example, safety and security missions overlap and optimizing one can have negative effects on the other. Same with performance and security. Optimizing these trade-offs is complex and depends on what you are trying to accomplish. Today, it is the ability to accomplish the entire mission workload within a system of systems environment and set of constraints that matters. Being cyber ready means to account for the complexity of overlapping concerns of a mission workload. Failing to accomodate the complexity of these systemic trade-offs can lead to big problems. Spectre and Meltdown are examples of systemic complexity flaw at the processor level. It is about the interactions between different things within the systemic complexity that caused the problems and is the primary issue with security. One industry at the forefront of solving this complex set of challenges is the autonomous system (AS) or for that matter, any system that runs at the edge. Edge devices include vehicles, drones, and mobile devices. ASs are challenging since there is no central control unit but each AS must deal with a distributed ecosystem of other autonomous systems, weather, etc. – basically the edge is a system of systems. The edge systems are turning to distributed AI/ML to provide valuable functionality and as the solution to optimization.
From a security perspective, the engineers that are designing edge systems must push beyond the cyber compliance mentality to one of cyber readiness – a readiness that encompasses more than a responsiveness to cyber threats and how well security risks are managed but also must accomodate the trade-offs of a dynamic system and its subsystems to include security, resiliency, reliability, performance, safety, privacy, sustainability, manageability, and more. Optimizing these trade-offs must involve new observations and tests to build new theories about security concepts and the interrelationships with other mission silos to show how and/or why a trade-off results in a particular phenomenon – such as a security vulnerability or an unsafe condition. These theories must also acknowledge the power / performance dynamics and real-time nature of many of these edge systems, and must accomodate the fact that the operational environment as well as the development environment (driven by time-to-market) are not sufficiently static to support pre-definition of all the requirements (security or otherwise), thereby leaving quite a bit of uncertainty that needs to be risk-managed. I could see the application of a type of equilibrium model to handle the trade-offs of the different system aspects and to arrive at a cyber readiness measure. This could be done at a point in time for a static system snapshot, or a long-running model could be applied for a dynamic system. Measuring cyber readiness is something that should and can be done at each stage of a system’s life cycle, from requirements validation through operational test and into production.
Cyber readiness for edge systems calls for design and development teams to turn to new design/verification, and new development/test flows, new security protections, and new achitectures that account for these trade-offs. For example, edge systems are moving to machine learning (ML) as a way to provide intelligent autonomy to edge systems. An interesting use case for ML at the edge is presented by Joe Weiss in how Windows HMIs should be replaced or augmented by ML. Machine learning places critical requirements on power and performance, while dealing with limited compute and storage resources on edge devices. These same resource limitations pose issues for placing security (e.g., crypto) at the edge thus creating a natural trade-off between ML and security needs at the edge, not to mention the performance trade-offs with low latency applications, such as drone detection, visual inspection, mixed reality services, and mobile private network solutions. Using off-the-shelf solutions to solve these issues is not practical. CPUs are too slow, GPUs/TPUs are expensive and consume too much power, and even generic machine learning accelerators can be overbuilt and are not optimal for power. Engineers must look to create new power/memory efficient and secure hardware architectures / chipsets to meet next-generation processing requirements for the edge.
NIST has been working on the challenges facing ASs for some time with a focus on trustworthiness of cyber-physical systems and IoT, as described in this Active Cyber™ interview with Dr. Ed Griffor of NIST. A key element of NIST work on CPS/IoT trustworthiness is the study of trade-offs or interdependencies between the concerns, like security, safety and reliability, using reasoning. A key task of the CPS/IoT Trustworthiness Project has been building foundations for reasoning or calculation “under uncertainty.” The AS space has recognized that the car will perform reasoning tasks using its logical models/data about the operational driving domain. Additional NIST collaborations on the AS front are with MIT, Intel Mobileye and others.
With this level of complexity due to competing concerns, as one problem is solved, more become apparent. This makes it harder to define use cases a-priori in designing cyber ready systems. Addressing this complexity necessitates a more intelligent approach to designing and building cyber ready systems. According to Dr. Griffor of NIST, “a new language is needed where we can pose cyber-physical problems of ASs and solve them combinatorially – a unified mathematics of the cyber and the physical together.” Cybersecurity is part of this picture but far from the whole. We will not be able to design protections for vehicles, or any other AS, without this broader view. We also need to manage this broader view as part of our path to cyber readiness.
More sophisticated engineering, assessment and test frameworks and tools are needed to work through this complexity and ensure the trustworthy composition of these systems across different concerns. You also need a rigorous approach to testing at different levels of fidelity (simulation, closed course/cyber range, real world), executing scenarios with thousands of variations, all the while collecting data for the purpose of improvement and to assess your cyber readiness. Some examples of helpful tools and frameworks include:
- Frameworks to measure cyber resilience and readiness – such as the CPS Framework by NIST for cyber-physical systems, CERT’s Resilience Management Model, and NDIA’s white paper – A Path Towards Cyber Resilient and Secure Systems – Metrics and Measures Framework.
- Dynamic risk frameworks such as the Security, Agility, Resilience, and Risk (SARR) Framework.
- Model-based systems engineering– such as KDM Analytics MBRA tool.
- NIST’s systems security engineering approach outlined in NIST SP 800-160 and the Air Force’s Cyber Resiliency Office for Weapon Systems – Systems Security Engineering Guidebook 4.0
- Modsim tools and network digital twins – such as Scalable’s EXata, now part of Keysight.
- Other types of digital twins, such as beamo.ai.
- DevSecOps frameworks and tools, such as these provided by the DoD.
- Testing and measurement tools to measure inputs and outputs to physical systems with sensors and actuators, or to validate or verify electronic designs, and to design smart machines, such as National instruments’ LabVIEW.
- Cyber ranges to explore the interactions of systems and to help identify the readiness of cyber defenders and system protections.
- Taxonomies, knowledge bases, and semantic analysis tools to drive cyber readiness insights by comprehensive data discovery, classification and tagging, pattern matching / link analysis, smart navigation, and knowledge model capabilities.
- Data governance, integration tools and time series databases that can also address data lineage, data curation, and data authenticity needs, such as Informatica or Talend.
The tools, methods, and frameworks must be closely integrated and often require custom approaches to tackle today’s increasing trade-off optimization challenges to be cyber ready.
Collecting metrics and events is essential to understanding the dynamic trade-offs involved at the edge’s system of systems environment. One example of a tool that can be helpful in collecting data at the edge is called Telegraf from InfluxData. Telegraf is a server-based agent for collecting and sending all metrics and events from databases, systems, and IoT sensors. Telegraf has reliable metric delivery guarantees. It also includes a scheduler, adjusts for clockdrift, full streaming support, and allows you to parse, format, or serialize unstructured data before sending it to its final destination, saving time and storage space. It can collect and store all kinds of data:
- IoT sensors – Collect critical stateful data (pressure levels, temp levels, etc.) with popular protocols like MQTT, ModBus, OPC-UA, and Kafka.
- DevOps Tools and frameworks – Gather metrics from cloud platforms, containers, and orchestrators like GitHub, Kubernetes, CloudWatch, Prometheus and more.
- System telemetry – Metrics from system telemetry like iptables, Netstat, NGINX, and HAProxy help provide a full stack view of your apps.
Telegraf also has output plugins to send metrics to a variety of other datastores, services, and message queues, including InfluxDB, Graphite, OpenTSDB, Datadog, Librato, Kafka, MQTT, NSQ, and many others.
Getting metrics and better diagnostics is a necessary step to knowing how to create optimized, cyber ready edge systems. NIST and USDOT have also set in motion discussions of the consensus measurement strategies for vehicle trustworthiness. Pre- and post- deployment measurement strategies must be coordinated. SAE J2980 (Functional Safety) and J3061 (Cybersecurity) are “recommended practices” that address safety and security. Most stakeholders agree that a common approach to safety and security is necessary. NIST’s focus on the primacy of measurement is a good fit with the community’s efforts in working on consensus measurement strategies for AS safety. These measurement strategies can inform standards and product safety and security efforts. One product diagnostics standards effort already evolving is the Service-Oriented Vehicle Diagnostics (SOVD) standard (aka HPC Diagnostics) which defines a standard interface to allow diagnostics on the vehicle, for example in the workshop, via remote access or as a tester directly in the vehicle (Proximity, Remote, InVehicle). The standard aims at providing one API for all diagnostic purposes as well as for software updates (cross vehicle). It is a consistent approach that is used for new systems as well as for traditional sensor / actuator systems. Its focus is shown in the Figure. To simplify standardization and subsequent implementations, as many existing mechanisms and standards as possible are to be used, such as HTTP/REST, JSON and OAuth.
AI/ML is at the center of everything that is happening at the edge. According to NVIDIA, a dominant player in edgeAI systems, the efficacy of deploying AI models at the edge arises from three recent innovations.
- Maturation of neural networks: Neural networks and related AI infrastructure have finally developed to the point of allowing for generalized machine learning. Organizations are learning how to successfully train AI models and deploy them in production at the edge.
- Advances in compute infrastructure: Powerful distributed computational power is required to run AI at the edge. Recent advances in highly parallel GPUs have been adapted to execute neural networks.
- Adoption of IoT devices: The widespread adoption of the Internet of Things has fueled the explosion of big data. With the sudden ability to collect data in every aspect of a business — from industrial sensors, smart cameras, robots and more — we now have the data and devices necessary to deploy AI models at the edge. Moreover, 5G is providing IoT a boost with faster, more stable and secure connectivity.
AI/ML comes with its own set of issues that must be addressed for edge systems. Most of these issues involve data quality problems, bias in training data, and how the data and algorithm are fitted. The model may also become useless in the future as data grows causing the model to become inaccurate. Models are often very tuned and specific to a task. Together, these issues make AI/ML hard to manage efficiently at the edge. There is some promising research that is emerging for improving neural networks and making them more efficient. One example is being developed by AISquared, a start-up. According to AISquared, they are developing:
“a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks. We employ these methodologies on well-known benchmarking datasets for testing purposes and show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.”
These types of multi-task, neural network models should help improve the efficiency of AS at the edge since they can reduce the overall management and maintenance workload for a model.
Besides providing intelligence for the functioning of autonomous systems, AI/ML is also being leveraged for securing these devices – see this Active Cyber™ article for an example. In cyber, AI technologies can improve threat intelligence, prediction, and protection. It can also enable faster attack detection and response, while reducing the need for human cybersecurity experts — specialists who are in critically short supply. AI can learn from security analysts and improve its performance over time, leading to time savings and better decisions. Cyber AI/ML models are also being deployed to ASs and IoT gateways, sensors, actuators and other edge devices to enable faster responses to attacks at the edge and improve cyber readiness. AL/ML is being employed more often to improve the delivery of gamefied education exercises. AI is used to emulate human cognition (e.g. learning based on experiences and patterns rather than inference) and deep machine learning advancements enable solutions to ‘teach themselves’ how to build models for pattern recognition. This becomes particularly valuable in cyber skills development where Natural Language Processing (NLP), a sub-category of AI, can communicate with a human during cyber exercises and aid in their progression through activities.
AI models are also a target for attackers and need to be protected from model extraction and data tampering / poisoning. Classic API protections, hardware integrity measures, and enterprise security controls can help mitigate against these types of attacks. Some examples of protection approaches are described in this article at Embedded Computing Design.
As previously mentioned, measurement is necessary to assess the cyber readiness of a system or application, including the personnel who operate it and use it. As Peter Drucker said, what gets measured, gets managed – and cyber readiness is no different. One example of a platform that provides cyber readiness assessments is from SightGain. The platform includes several modules:
- SOC Validation – continuously tests production SOC technologies, processes, and analysts’ ability to detect and respond to real threats.
- Automated Compliance – tests and answers security control requirements for compliance across leading frameworks, including CMMC, MITRE ATT&CK v10, Zero Trust, ISO 27001, NIST 800-171, and NIST 800-53.
- Risk Assessment – provides a comparison of an organization’s performance against leading threats to identify areas for improvement.
- Training – uses realistic, gamified scenarios to allow organizations to evaluate their overall cyber readiness of their production systems.
- Cyber BS Detector – a commercial test bed to empirically evaluate new solutions against current threats.
I see SightGain’s platform as extremely useful for assessing and maintaining cyber readiness as defined strictly within the security aspects of a system. However, my more all-encompassing definition of cyber readiness calls for measures that are reflective of an assessment process that takes into account other system domains such as safety, performance, etc. This view of cyber readiness, I believe, becomes more important as you move into edge systems. To assess cyber readiness at the edge, you really need system level cyber readiness tools – such as cyber ranges that have been built to support IoT or Industry 4.0 needs – since many of the complex problems that can degrade cyber readiness are system-level issues and interactions between systems.
One of the potential weak links in any organization’s cybersecurity readiness is third parties. Risks include sophisticated software supply chain attacks, like the recent SolarWinds hack. They can also happen when an attacker exploits a vendor’s weak security controls and moves up the digital supply chain until it finds its target. Therefore, supply chain assessments must be part of any cyber readiness assessment of a system.
The creation of a cyber ready system needs partnerships, and the inclusion of all stakeholders is critical. This is especially true for cyber-physical assets, like utilities and electrical grids and edge systems. Although the Federal government supplies the mandate for national security, most cyber-physical critical sectors are under the auspices of state or local government. In the era of cyber-warfare, this necessitates a greater than ever partnership for national security.
So what are your views on cyber readiness? How do you assess your systems’ cyber readiness? Let me know your views and comments on this evolving topic.
And thanks to my subscribers and visitors to my site for checking out ActiveCyber.net! Please give us your feedback because we’d love to know some topics you’d like to hear about in the area of active cyber defenses, authenticity, PQ cryptography, risk assessment and modeling, autonomous security, digital forensics, securing OT / IIoT and IoT systems, Augmented Reality, or other emerging technology topics. Also, email chrisdaly@activecyber.net if you’re interested in interviewing or advertising with us at Active Cyber™.