Escalating damages associated with international catastrophes, such as Hurricane Sandy and the Fukushima Daiichi Nuclear Meltdown, have spurred the Office of the President to release Executive Orders that direct government agencies to enhance national preparedness and resilience. However, there has been a struggle to comply with these directives as there is limited guidance on how to measure and design resilient systems. As a further challenge, the Defense Science Board states that resilience metrics must be focused, yet generalized, in order to be applied across cyber, defense, and energy systems in the Department of Defense and other agencies. We assert that metrics of resilience in the literature often fail due to existing conceptual issues that reduce their use, including the conflation of risk and resilience, and the necessity of reconciling engineering and ecological resilience definitions and objectives. We explain these conceptual issues and discuss military doctrine required to support the development of metrics that meet government agency needs. Furthermore, we provide a list of example metrics that overcome these barriers, and can be used across systems.
Resilience metrics must be general enough to support broad applications, yet precise enough to measure system-specific qualities. Such metrics are necessary to make resource and operations decisions in government agencies that manage diverse systems (e.g., cyber, defense, ecological, and energy), while fostering cooperation between agencies.
Risk assessment is the traditional method used to make decisions about adverse events. Resilience and risk are often conflated, but risk-based quantitative and qualitative methods provide limited guidance for emerging and unforeseen threats.
Multiple definitions of resilience exist (i.e., for engineering or ecological resilience), and metrics are often developed to support singular definitions. However, different definitions of resilience have complementary objectives, and metrics should incorporate multiple viewpoints to ensure efficacy for government agencies.
Network Centric Operations (NCO) doctrine organizes systems into four domains that can be managed for any event affecting the system. Combining this doctrine with an understanding of resilience processes sets a new conceptual foundation, separate from risk assessment, for developing and organizing resilience metrics.
We provide three tables of example metrics, organized via the U.S. National Academy of Sciences definition of resilience, that are representative of engineering, environmental, and cybersecurity systems.
As terrorist attacks and natural disasters become more frequent and costly, the U.S. Office of the President is initiating a national push to create a more resilient society1-3 that can recover from these events and persevere. Resilience, as defined by the U.S. National Academy of Sciences (NAS), is the ability to plan and prepare for, absorb, recover from, and more successfully adapt to adverse events.4 Here, resilience is a function of the physical losses at the time of the event, and social processes before and after that govern the management of known vulnerabilities, sustained damages, and adaptations needed to face future threats. When understood and implemented, this can offer system benefits across broad domains including engineering, ecology, cybersecurity, social sciences, and health. However, resilience remains difficult to apply in government agencies, as metrics used to assess and improve system resilience largely do not exist, and where they do, do not function across diverse systems. Metrics fail because they are developed based on concepts of risk instead of resilience, and do not consolidate different resilience definitions that relate to engineered, ecological, and social systems. As a result, recent attempts to address resilience by the Defense Science Board (DSB)5 demonstrate that the current science does not meet Department of Defense (DOD) needs. This article discusses two identified conceptual barriers to resilience: its conflation with risk, and the lack of a standard definition. It provides concepts that bridge resilience and military doctrine to support the development of more effective metrics for a wide variety of disciplines.
Breaking Down Barriers
To improve societal resilience, government agencies first need metrics to assess the state of resilience. Metrics are measurable quantities that are used to compare and justify resource and operations decisions. Data collection for these metrics is often carried out by engineers, scientists, and specialists who have area expertise and understand the limitations of built and natural environments. Data is then aggregated into metrics used by decision-makers to help direct decisions within a single agency, and across multiple agencies. This includes high-level government personnel who manage the funding and completion of resilience-focused projects, such as the implementation of cybersecurity measures that ensure the operation and recovery of critical resource systems under cyber threat (e.g., electric power, water, and financial). To support effective applications of resilience, it is necessary to identify which qualities of known metrics obstruct their use by government agencies. We assert that two of these issues are misconceptions: the conflation of resilience with concepts of risk6, and the fact that resilience has multiple application-dependent definitions that confuse the objective that metrics are trying to support.
Risk assessment7 is foundational for decisions associated with adverse events in many government agencies8. Its widespread acceptance impedes implementation of a successful resilience strategy, as risk and resilience are separate concepts, though complementary.9,10 Resilience describes the ability of a system to absorb and recovery functionality, whereas risk quantifies known hazards and expected damages. Risk-based metrics for addressing resilience are not efficient as they require quantification of an exposure-effect relationship, which is not possible for emerging and unforeseen threats. They also tend to assess risks to individual components, ignoring system functionality as the result of interacting components (e.g., telecommunications, targeting systems, and personnel for defense operations). Rather than the static view of systems and networks in risk assessment, resilience adopts a dynamic view. This means resilience metrics must also consider the ability of a system to plan, prepare, and adapt as adverse events occur, rather than focus entirely on threat prevention and mitigation. Resilience depends upon specific qualities that risk assessment cannot quantify, such as system flexibility and interconnectedness. For these reasons, risk assessment cannot be used to establish metrics for measuring the resilience of systems.
Using risk assessment to measure system resilience only offers solutions to incremental, known risks, and does little to manage unforeseen events or perform under the stress of catastrophe. In complex engineering systems, resilience metrics fail to extend beyond risk assessment. Some proposed approaches to resilience merely expand the boundaries of traditional risk calculations (see examples11,12). These methods employ probabilistic risk assessment to link system component losses to system functionality losses. For example, risks to a building’s elevators could be linked to the response time of management and of maintenance personnel to estimate the loss of transportation capacity in the building system. This approach fails to extend beyond the typical exposure-effect-damage paradigm of risk assessment. Requiring fore-knowledge of the event type, probability of event occurrence, and potential system damage limits design efforts to the known. Successful resilience analysis extends decisions from probabilities to possibilities, and risk assessment does not support this.
While risk assessment and resilience can be complementary in concept, actual risk-based and resilience-based decisions often conflict. Thus, the widespread use of risk-threshold criteria for regulation by government agencies, such as by the Federal Aviation Administration13 may hinder the implementation of resilience-based decisions. An example of a risk versus resilience decision would be to strengthen airport security measures to prevent a terrorist attack (risk) instead of increasing the number of airports, flight paths, or specialized personnel required to respond (resilience). The redundancy imposed in the resilient option does not reduce the risks, yet ensures that national travel receives less damage and can recover faster in the event of a terrorist attack.
Establishing metrics for government agencies is also obstructed by the multiple definitions of resilience that apply to systems under the jurisdiction of a single department or agency. Federal bureaucracy dictates that definitions of resilience must be transparent across agencies to foster cooperation, yet in practice, resilience has multiple definitions and objectives, the two most prominent being engineering and ecological.14 Engineering resilience is focused on the ability of a system to absorb and recover from damages from adverse events, while ecological resilience is focused on understanding how close a system is to collapse and reorganization. The engineering definition brings resilience principles such as robustness, redundancy, and modularity, while the ecological definition supports principles of flexibility, adaptability, and resourcefulness. An example of engineering resilience would be a bridge with specialized personnel that provide fast recovery of transportation functions after unforeseen equipment failure. Engineering resilience definitions are already in use by some government agencies,15 but to meet national policy objectives, ecological resilience must also be considered as it offers other benefits. For example, forests exhibit ecological resilience as they naturally burn down and re-grow in a cyclical pattern, a process that decreases chances for catastrophic forest fires. In each case, resilience is related to maintaining some prescribed function, whether continued transportation options or ecosystem services. A resilient system does not require continual provision of the function, but instead requires that in failure there is some form of recovery or adaptation so that the function can be maintained. The methods employed in complex systems science to measure resilience have relevance in engineering, ecological, social, and economic systems, as they can demonstrate both a need to maintain critical functionality, as well as emergent and unpredictable (complex) behavior. A strong resilience program should make clear use of each of these definitions.
Metrics of resilience tend to be compartmentalized into engineering or ecological terms, and thus fail to integrate knowledge across both, losing the ability to incorporate resilience improvement strategies from one system into the other. In addition, using a single definition for metrics limits the potential benefits that resilience offers. As each definition brings with it a set of conceptual guidelines that are complementary and desirable, a more pragmatic approach is to combine the two. Results from this hybrid resilience approach should be able to create more representative sets of metrics and analysis techniques that can guide the creation of systems resilient to a wider range of threats. Although no single solution can address all threats simultaneousl16y systems should be as resilient as possible. Furthermore, the bureaucratic nature of government agencies emphasizes the importance of a hybrid concept. Using both definitions in tandem facilitates communication across diverse systems found in engineering, environmental, disaster management, and health agencies, as well as across multiple levels of government. By establishing a consistent set of resilience metrics that stands on its own and encompasses a broad spectrum of definitions, a framework can be created that is adaptable in a wide array of disciplines.
Resilience in Defense
The difficulty these barriers create is illustrated in recent work by the DSB to establish metrics of cyber resilience. Since 1956, the DSB has played an important in role in shaping national policy decisions for science and technology within US defense agencies. The DSB consists of military and scientific leaders across multiple DOD and federal science funding agencies. They advise the highest-level individuals in the US military on innovative technologies to help the DOD manage the vast array of infrastructure and manpower under its authority. This includes a complex array of cyber, defense, and energy infrastructure for global military operations. A recent DSB report addresses the resilience of military systems to advanced cyber threats17 and explains that the size, importance, and capabilities associated with US military systems makes their resilience to unknown threats a national imperative. Due to the high level of risk associated with the loss of critical cyber systems, even minor changes in computer code (e.g., via malware) can result in catastrophic losses comparable to the nuclear threat of the cold war. Cyberattacks have been easily implementable since the 1990s through simple and inexpensive computer code, yet the growing use and reliance on cyber systems in military operations has increased costs for cybersecurity. The DSB calculates that there is no feasible way to prevent all threats to defense systems, and suggests that planned resilience should thus play a key component in cyber threat management.
Although the DSB indicates the need for resilience, the Presidential Policy Directive (PPD-21) and Executive Order (EO-13636) that direct the implementation of resilience programs in government agencies do not provide clear guidance on how to achieve resilience.18,19 The instructions provided in EO-13636 direct the creation of a cybersecurity framework by the National Institute of Standards and Technology. Such a framework must be populated with metrics that measure the cyber resilience of each piece of infrastructure. The DSB states that few such metrics exist and those that do are not suited to meet national defense needs. It further identifies two specific criteria for metrics which were not evident in the literature: they must be general enough to be used in a wide range of systems (e.g., weapons, operations, energy, and telecommunications), yet specific enough to relate to individual system objectives and components. The DSB’s 2013 report provides the development of a new metrics dashboard and a framework for implementation.20
This work demonstrates a gap between national policy objectives and the current state of real-world application which prevents a sustainable national resilience program. The limited guidance found in PPD-21 and EO-13636 forces multiple agencies to create their own metrics frameworks. However, this piecemeal approach limits the capability for national resilience, as each agency individually determines definitions and conducts metrics research. Instead, generalized metrics that can be applied across the wide range of engineered, environmental, and social systems managed by government agencies are needed.
Metrics that Support Resilience
Where the DSB sheds light on the needs of agencies, military doctrine can also be used to inform the development of better metrics. A defining characteristic of the modern age is high level of connectivity between systems, and the DOD manages some of the most globally extensive and diverse personnel, cyber, and infrastructure systems. This ubiquitous connectivity21 introduces new vulnerabilities to military systems that are sometimes impossible to predict.#22,23,24 In response, the DOD Command and Control Research Program has adopted a Network-Centric Operations (NCO) doctrine that is informative for advancing resilience thinking.25 NCO accelerates the ability to manage warfare by focusing on the control of large networks operating in physical, information, cognitive, and social domains, which can collectively describe any system. They are defined as:
- Physical: the engineering capabilities of infrastructure or devices, efficiencies, and network structures. This includes all data collection equipment and measurable real-life system components;
- Information: the usage of what we measure and know about the physical domain, including data use, transfer, analysis, and storage;
- Cognitive: human processes, i.e., translating, sharing, and acting upon knowledge to make, communicate, and implement decisions throughout the system; and
- Social: interactions and entities that influence how decisions are made, including government regulations, religions, cultures, and languages.
Military research into command and control has found that a highly connected system can enhance overall military operations through application of NCO principles.26,27 The primary principle termed ‘power to the edge’28 is a shift from traditional top-down administrative structures to one that implements information collection, sharing, and utilization networks. Implementation of this principle infuses military systems with characteristics that parallel and reinforce those associated with resilience, such as responsiveness, flexibility, versatility, and innovation.29 Linkov et al.30 were the first to link NCO doctrine and the NAS resilience definition in a framework to guide metrics development for resource and operations decisions. Their approach groups relevant system components into a ‘resilience matrix’ that requires the consideration of the NCO domains—physical, information, cognitive, and social—to fulfill each of the NAS defined resilience functions31 to: plan/prepare, absorb, respond, and adapt. Previous work by the authors demonstrates the validity of this approach to integrate metrics from multiple scientific disciplines to develop generalized metrics for cybersecurity32 and energy systems.33
Still, cross-comparison of diverse systems such as engineering, environmental, and cyber requires greater detail of the resilience processes that govern interactions between NCO domains. Resilience processes represent the emergent skills or qualities of a system associated with its resilience. For example, as a professional sports team works together to win a game, emergent skills associated with the entire team (e.g., communication) govern how well it will succeed. In a similar way, the resilience processes of a system are the skills that benefit the system in becoming resilient. In the literature for complex engineering systems, four resilience processes provide an important example for future metrics development:34
- Sensing: the effort to measure new information about a system’s operating environment with special attention on anomalous data. Anomalous data can provide the greatest support for resilience design, as it can alert system evaluators of overlooked possibilities. This process connects components in the physical domain to the information domain.
- Anticipation: imagining multiple future states without reducing improbability to impossibility. This process connects components in the information domain to the cognitive domain.
- Adaptation: reacting to changing conditions or states, in novel ways, to restore critical functionality under altered system conditions or operating environments. This process connects the cognitive domain to the physical domain.
- Learning: observing external conditions and system responses to improve understanding of relationships and possible futures, identifying needs for system improvement where applicable. This process connects the physical, information, and cognitive domains together and can incorporate the social domain depending on the system studied.
NCO domains and resilience processes offer necessary conceptual guidelines to support the creation of metrics for government agency systems. A system example associated with multiple agencies is a dam network. The DOD, Environmental Protection Agency, and numerous community stakeholders are all involved in the building and maintenance of dam networks within the US. The physical domain includes the dam infrastructure, pumping stations, equipment, meters, and sensors used to monitor changes in infrastructure and the environment. The physical domain is represented as data and knowledge in the information domain, where it is presented and used for learning. The people working in the dam network, government agencies and communities involved, use the information domain to anticipate outcomes, formulate decisions, and take action. The management structure and social capital inherent in the social domain has direct implications as to which anticipated decisions matter and how people will respond to actions. Relationships between domains depend upon how environmental and infrastructure changes are sensed, events are anticipated, actions are adapted, and learning occurs throughout all domains.
Metrics across Multiple Systems
Overcoming misconceptions that hampered previous metrics enables one to construct metrics that meet DSB and government agency criteria. As an example, we have developed metrics that can be used to measure the resilience of engineering, environmental, and cybersecurity systems (Tables 1-3). We began by conducting a literature review of cybersecurity and engineering infrastructure metrics.35,36,37,38,39,40,41 From the collected metrics, we established a list of those that did not appear to be risk- or resilience-definition biased and then determined which of this subset were conceptually sound in terms of NCO domains and resilience processes. We organized the resulting metrics into the resilience matrix described by Linkov et al.42 for each system. Therefore, the metrics found in Tables 1-3 are new metrics for any system, informed by resilience literature and showing distinct similarities to several metrics found in the following sources: Park et al.43 MITRE,44 Cutter et al.45 and Fischer et al.46
Parallel metrics from Tables 1 and 2 for engineering and ecological resilience, were selected as appropriate to populate Table 3 for cyber resilience. Measuring each system’s resilience separately supports systemic changes to improve resilience operations. The matrix approach does not identify interrelated risks, but comparing across systems can identify the possibility of cascading effects and help in making resource decisions. When the basis for a metric is the same across systems, the loss of that resource will cause damage through all systems. For example, the majority of metrics presented have a direct connection to budgetary decisions, implying that a loss of budget can cause adverse effects to all systems simultaneously. Also, comparing the metrics values helps determine how resources should be appropriated. For instance, relative resource strength in the preparation stage of engineering, environmental, and cyber systems helps an agency tasked with managing the resilience of all to distribute resources amongst them.
Furthermore, Tables 1-3 provide an example as to how NCO domains and resilience processes can inform the development of better metrics in government agencies. The metrics presented in Tables 1-3 are not comprehensive, but system similarities indicate that these principles support the development of broadly useful metrics. In addition, the tables are organized based on the NAS definition of resilience to help orient them for national policy. Therefore, the tables show a cohesive framework that connects national goals, scientific opinion, military doctrine, and the needs of government agencies.
The concepts and examples provided help reorient resilience metrics to meet national policy objectives. The DOD has identified issues with metrics found in the literature, but military-science-based research can help inform ways to overcome inherent issues. Understanding the difference between risk and resilience and the necessity of employing qualities of both engineering and ecological resilience definitions helps lay the foundation for improved metrics. NCO domains and resilience processes offer constraints on metrics that are conceptually sound and do not limit broad applications. We were able to reanalyze available metrics by combining concepts of resilience into new metrics that can be used by government agencies.
About the authors:
Daniel A. Eisenberg is Contractor to the Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA and also with the Ira Fulton Schools of Engineering, Arizona State University, Tempe, AZ, USA; Jeryang Park is with the School of Urban and Civil Engineering, Hongik University, Seoul, Republic of Korea; Matthew E. Bates and Cate Fox-Lent are with the Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA; Thomas P. Seager is with the Ira Fulton Schools of Engineering, Arizona State University, Tempe, AZ, USA; and Igor Linkov is with the Environmental Laboratory, US Army Engineer Research and Development Center, Vicksburg, MS, USA (Address correspondence for Igor Linkov by mail to: Igor Linkov, U.S. Army Corps of Engineers, 696 Virginia Rd., Concord, MA 01742, USA; email: firstname.lastname@example.org.
The authors would like to thank the editor and reviewers of The Solutions Journal and Zachary Collier for their helpful comments on the manuscript. Permission was granted by the USACE Chief of Engineers to publish this material. The views and opinions expressed in this paper are those of the individual authors and not those of the US Army, or other sponsor organizations. This material is based upon work supported by the NSF under Grant No. 1140190 and an NSF-funded IGERT-SUN fellowship Grant No. 1144616. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.