Operational Security 2019
Operations security has to do with specific processes (operations) undertaken. It tends to be in effect for a finite period of time and be defined in terms of specific objectives. This blog explains What is Operational Security and how it is implemented in the organization.
Threats are identified relative to the operation, vulnerabilities are associated with the capabilities and intents of the specific threats to the operation, and defensive measures are undertaken to defeat those threats for the duration of the operation. These defenses tend to be temporary, one-time, unstructured, and individualized.
Operations consist of special purpose efforts, typically to meet a crisis or unusual one-off situation. The trans-Alaska pipeline creation was an operation requiring operations security, but its normal use requires operational security.
The bridge that collapsed in Oakland, California, due to a fuel truck fire was repaired in a matter of a few weeks, and this is clearly an exceptional case, an operation requiring operations security. However, the normal process of building and repairing roads is an operational security issue.
For operations, security is a one-off affair, thus, it is typically less systematic and thoughtful in its design, and it tends not to seek optimization as much as a workable one-off solution and costs are not controlled in the same way because there are no long-term life-cycle costs typically considered.
Decisions to accept risks are far more common, largely because they are being taken once instead of many times, so people can be far more attuned and diligent in their efforts than will happen day after day when the same things are repeated.
In some cases, operations security is more intensive than operational security because it is a one-off affair, so more expensive and specialized people and things can be applied.
Also, there is little, if any, history to base decisions on because each instance is unique, even if a broader historical perspective may be present for experienced operations workers.
Operational security is a term we use for the security we need around normal and exceptional business processes. This type of operational security tends to continue indefinitely and be repeated, readily and not focused on a specific time frame or target.
In other words, these business processes are the day-to-day things done to make critical infrastructures work. Protection of normal operations tends to be highly structured and routine, revisited periodically, externally reviewed, and evolutionary.
Information protection addresses ensuring the utility of content. Content can be in many forms, as can its utility.
For example, names and addresses of customers and their current amounts due are useful for billing and service provisioning, but if that is the sole purpose of their presence, they lose utility when applied to other uses, such as being stolen for use in frauds or sold for advertising purposes.
Since utility is in the context of the infrastructure, there is no predefined utility, so information systems must be designed to maximize utility specific to each infrastructure provider or they will not optimize the utility of content.
The cost of custom systems is high, so most information systems in most critical infrastructures are general purpose and thus leave a high potential for abuse.
In addition to the common uses of content such as billing, advertising, and so forth, critical infrastructures and their protective mechanisms depend on information for controlling their operational behaviors.
For example, SCADA systems are used to control the purification of water, the voltage and frequency of power distribution, the flow rates of pipelines, the amount of storage in use in storage facilities, the alarm and response systems of facilities, and many similar mechanisms, without the proper operation of which, these infrastructures will not continue to operate.
These controls are critical to the operation and if not properly operating can result in loss of service; temporary or long-term loss of utility for the infrastructure; the inability to properly secure the infrastructure;
damage to other devices, systems, and capabilities attached to the infrastructure; or, in some cases, inter-infrastructure collapse through the interdependency of one infrastructure on another.
For example, an improperly working SCADA system controlling stored water levels in a water tower could empty all of the storage tanks, thus leaving inadequate supply for a period of time. As the potential for negative consequences of lost information utility increases, so should the certainty with which that utility is ensured.
Information is subject to regulatory requirements, contractual obligations, owner- and management-defined controls, and decisions made by executives. Many aspects of information and its protection are subject to audit and other sorts of reviews.
As such, a set of duties to protect are defined and there is typically a governance structure in place to ensure that controls are properly defined, documented, implemented, and verified to fulfill those duties.
Duties are codified in the documentation that is subject to audit, review, and approval and that defines a legal contract for carrying out protective measures and meeting operational needs.
Typically, we see policy, control standards, and procedures as the documentation elements defining what is to be done, by whom, how, when, and where. As tasks are performed, these tasks are documented and their performance reviewed with sign-offs in logbooks or other similar mechanisms.
These operational logs are then used to verify from a management perspective that the processes as defined were performed and to detect and correct deviations from policy.
The definition of controls is typically required to be done through an approved risk management process intended to match surety to risk to keep costs controlled while providing adequate protection to ensure the utility of the content in the context of its uses.
This typically involves identifying consequences based on a business model defining the context of its use within the architecture of the infrastructure, the threats and their capabilities and intents for harming the infrastructure, and the architecture and its protective features and lack thereof.
Threats, vulnerabilities, and consequences must be analyzed in light of the set of potentially complex interdependencies associated with both direct and indirect linkages. Risks can then be accepted, transferred, avoided, or mitigated to levels appropriate to the situation.
In large organizations, information protection is controlled by a chief information security officer or some similarly titled position.
However, most critical infrastructure providers are small local utilities that have only a few tens of workers in total and almost certainly do not have full-time information technology (IT) staff. If information protection is controlled at all, it is controlled by the local IT worker.
As in physical protection, deterrence, prevention, detection and response, and adaptation are used for protection. However, in smaller infrastructure providers, design for prevention is predominantly used as the means of control because detection and response are too complex and expensive for small organizations to handle and adaptation is too expensive in its redesigns.
While small organizations try to deter attacks, they are typically less of a target because of the more limited effects attainable by attacking them.
As is the case for physical security, information protection tends to be thought of in terms of layers of protection encircling the content and its utility; however, most information in use today gains much of its utility through its mobility. Just as in transportation, this limits use of protective measures based on situational specifics.
To be of use, information must be processed in some manner, taking information as input and producing finished goods in the form of information useful for other purposes at the other end of each step of its production. Information must be protected at rest, in motion, and in use to ensure its utility.
Control of the information protection system is typically more complex than that of other systems because information systems tend to be interconnected and remotely addressable to a greater degree than other systems.
While a pipeline has to be physically reached to do harm, a SCADA system controlling that pipeline can potentially be reached from around the world by using the interconnectedness of systems whose transitive closure reaches the SCADA system.
While physically partitioning SCADA and related control systems from the rest of the world are highly desirable, it is not the trend today. Indeed, regulatory bodies have forced the interconnection of SCADA systems to the Internet in an attempt to make more information more available in real-time.
Further, for larger and interconnected infrastructures such as power and communications systems, there is little choice but to have long-distance connectivity to allow shared sourcing and distribution over long distances.
Increasingly complex and hard-to-understand and manage security barriers are being put in place to allow the mandated communication while limiting the potential for exploitation.
In addition, some efficiency can be gained through collaboration between SCADA systems, and this efficiency translates into a lot of money, exchanged for an unquantified amount of reduction in security.
SCADA systems are only part of the overall control system that functions within an infrastructure for protection.
Less time-critical control systems exist at every level, from the financial system within a nonfinancial enterprise to the governance system in which people are controlled by other people. All of these, including the paper system, are information systems.
All control systems are based on a set of sensors, a control function, and a set of actuators. These must operate as a system within limits or the system will fail. Limits are highly dependent on the specifics of the situation, and as a result, engineering design and analysis are typically required to define the limits of control systems.
These limits are then coded into systems with surety levels and mechanisms appropriate to the risks. Most severe failures come about when limits are improperly set or the surety of the settings or controls limiting those settings being applied is inadequate to the situation at hand. For example, the slew rate of a water valve might have to be controlled to prevent pipe damage.
In other infrastructures, such as financial systems, control systems may be far more complex, and it may not be possible to completely separate them from the Internet and the rest of the world.
For example, electronic payment systems today operate largely over the Internet, and individuals, as well as infrastructure providers, can directly access banking and other financial information and make transfers or payments from anywhere.
In such an infrastructure, a far more complex control system with many more actuators and sensors is required, and a far greater management structure is going to be needed.
In voting systems, to do a good job of ensuring that all legitimate votes are properly cast and counted, a paper trail or similar unforgeable, obvious, and hard-to-dispute record has to be apparent to the voted and the count ers.
The recent debacles in voting associated with electronic voting have clearly demonstrated the folly of such trust in information systems when the risks are so high, the systems so disbursed, and the operators so untrained, untrusted, and inexperienced.
These systems have largely been out of control and therefore untrustworthy for the use.
Technology and Process Options
There are a lot of different technologies and processes that are used to implement protection. A comprehensive list would be infeasible to present without an encyclopedic volume, and the list changes all the time, but we would be remiss if all of the details were left out.
The lists of such things, being so extensive, are far more amenable to computerization than printing in books.
Rather than add a few hundred pages of lists of different things at different places, we have chosen to provide the information within a software package that provides, what amounts to, checklists of the different sorts of technologies that go in different places. To give a sense of the sorts of things typically included in such lists, here are some extracts.
In the general physical arena, we include perimeters, access controls, concealments, response forces, property location and geology, property topology and natural barrier, facility location, and attack graph issues, and observe, orient, decide, and act (OODA) loops, perception controls, and locking mechanisms.
Given that there are specified business and operational needs, specified duties to protect, and a reasonably well-defined operating environment, proposed architectures and designs, along with all of the processes, management, and other things that form the protection program.
And the plan, need to be evaluated to determine whether protection is inadequate, adequate or excessive, reasonably priced, and performing for what is being gained and to allow alternatives to be compared.
Unlike engineering, finance, and many other fields of expertise that exist in the world, the protection arena does not have well-defined and universally applied analysis frameworks.
Any electrical engineer should be able to compute the necessary voltages, currents, component values, and other things required to design and implement a circuit to perform a function in a defined environment.
Any accountant can determine a reasonable placement of entries within the double entry bookkeeping system. However, if the same security engineering problem is given to a range of protection specialists, there are likely to be highly divergent answers.
One of the many reasons for the lack of general agreement in the security space is that there is a vast array of knowledge necessary to understand the entire space and those who work in the space range over a vast range of expertise.
Another challenge is that many government studies on the details of things like fence height, distances between things, and so forth, are sensitive because if the details are known, they may be more systematically defeated, but on the whole, the deeper problem seems to stem from a lack of a coherent profession.
There are many protection-related standards, and to the extent that these standards are embraced and followed, they lead to more uniform solutions with a baseline of protection.
For example, health and safety standards mandate a wide range of controls over materials, building codes ensure that certain protective fences do not fall over in the wind or accidentally electrocute passersby;
standards for fire safety ensure that specific temperatures are not reached within the protected area for a period of time is defined external conditions, standards for electromagnetic emanations limit the readability of signals at a distance, and shredding standards make it very hard to reassemble most shredded documents when the standards are met.
While there are a small number of specialized experts who know how to analyze these specific items in detail, protection designers normally just follow the standards to stay out of trouble—or at least they are supposed to.
Unfortunately, most of the people who work designing and implementing protective systems are unaware of most of these standards, and if they are unaware, they most certainly do not know whether they are following these standards and cannot specify them as requirements or meet them in implementation.
From a pure analysis standpoint, there is a wide range of scientific and engineering elements involved in protection, and all of them come to bear in the overall design of protective systems for infrastructures.
However, the holy grail of protection comes in the form of risk management: the systematic approach to measuring risk and making sound decisions about risk based on those measurements.
The problem with this starts with the inability to define risk in a really meaningful way, followed by the inability to measure the components in most definitions, the high cost of accurate measurements, the difficulty in analyzing the effect of protective measures on risk reduction, and the step functions in results associated with minor changes in parameters.
Standard Design Approaches
The standard design approaches are based on the notion that in-depth protection science and/or engineering can be applied to define a design that meets the essential criteria that work for a wide range of situations.
By defining the situations for which each design applies, an organization can reduce or eliminate design and analysis time by simply replicating a known design where the situation meets the design specification criteria.
Thus, a standard fence for protecting highways from people throwing objects off of overpasses can be applied to every overpass that meets the standard design criteria, and “the paralysis of analysis” can be avoided.
The fiats that have to be watched carefully in these situations are that
(1) the implementations do indeed meet the design criteria, (2) the design actually does what it was intended to do, and (3) the criteria are static enough to allow for a common design to be reproduced in place after place.
It turns out that, to a close approximation, this works well at several levels. It works for individual design components, for certain types of composites, and for architectural level approaches.
By using such approaches, analysis, approval processes, and many other aspects of protection design and implementation are reduced in complexity and cost, and if done on a large scale, the cost of components can go down because of mass production and competition. However, mass production has its drawbacks.
For example, the commonly used mass production lock and key systems used on most doors are almost uniformly susceptible to the bump-key attack.
As the sunk cost of a defense technology increases and it becomes so standard that it is almost universal, attackers will start to define and create attack methods that are also readily reproducible and lower the cost and time of the attack. Standardization leads to common mode failures.
The cure to this comes in the combinations of protective measures put in place. The so-called defense-in-depth is intended to mitigate individual failures;
and if applied systematically with variations of combinations forming the overall defense, then each facility will have a different sequence of skill requirements for attack and the cost to the attackers will increase while their uncertainty increases as well.
They have to bring more and more expensive things to increase their chances of success unless they can gather intelligence adequate to give away the specific sequences required, and they have to have more skills, train longer, and learn more to be effective against a larger set of targets.
This reduces the threats that are effective to those with more capabilities and largely eliminates most of the low-level attackers (the so-called ankle biters) that consume much of the resources in less well-designed approaches.
As it turns out, there is also a negative side effect of effective protection against low-level attacks. As fewer and fewer attackers show up, management will find less and less justification for defenses. As a result, budgets will be cut and defenses will start to decay until they fail altogether in a rather spectacular way.
This is why bridges fall down and power systems collapse and water pipes burst in most cases. They become so inexpensive to operate and work so well that maintenance is reduced to the point where it is inadequate. It works for a while and then fails spectacularly.
Subsequently, in a case where businesses run infrastructures and short-term profits are rewarded over long-term surety, management is highly motivated and rewarded by shirking maintenance and protection and leaving success to luck in these areas.
So we seem to have come full circle. Standard designs are good for being more effective with less money, but as you squeeze out the redundancy and the costs, you soon get to common mode failures and brittleness that cause collapses at some future point in time.
So along with standard designs, you need standard maintenance and operational processes that have most of the same problems, unless rewards are aligned with reliability and long-term effectiveness. Proper feedback, then, has to become part of the metrics program for the protection program.
Design Automation and Optimization
For protection fields, there is only sporadic design automation and optimization, and the tools that exist are largely proprietary and not sold widely on the open market.
Unlike circuit design, building design, and other similar fields, there has not been a long-term academic investigation of most areas of protection involving intentional threats that have moved to mature the field.
While there are many engineering tools for the disciplines involved in protection, most of these tools do not address malicious actions. The user can attempt to use these to model such acts, but these tools are not designed to do so and there are no widely available common libraries to support the process.
In the risk management area, as a general field, there are tools for evaluating certain classes of risks and producing aggregated risk figures, but these are rudimentary in nature, require a great deal of input that is hard to quantify properly, and produce relatively little output that has a material effect on design or implementation.
There are reliability-related tools associated with carrying out the formulas involved in fault tolerant computing and redundancy, and these can be quite helpful in determining maintenance periods and other similar things, but again, they tend to ignore malicious threats and their capacity to intentionally induce faults.
For each of the engineering fields associated with critical infrastructures, there are also design automation tools, and these are widely used, but again, these tools typically deal with the design issue, ignoring the protective issues associated with anything other than nature.
Control systems represent a different sort of IT than most designers and auditors are used to. Unlike the more common general-purpose computer systems in widespread use, these control systems are critical for the moment-to-moment functioning of mechanisms that, in many cases, can cause serious negative physical consequences.
Generally, these systems can be broken down into sensors, actuators, and PLCs themselves controlled by SCADA systems.
They control the moment-to-moment operations of motors, valves, generators, flow limiters, transformers, chemical and power plants, switching systems, floor systems at manufacturing facilities, and any number of other real-time mechanisms that are part of the interface between information technologies and the physical world.
When they fail or fail to operate properly, regardless of the cause, the consequences can range from a reduction in product quality to the deaths of tens of thousands of people, and beyond, and this is not just theory.
It is the reality of incidents like the chemical plant release that killed about 40,000 people in a matter of an hour or so in Bhopal India and the Bellingham Washington SCADA failure of the Olympic Pipeline Company that, combined with other problems in the pipeline infrastructure at the time, resulted in the deaths of about 15 people and put the pipeline company out of business.
[Note: You can free download the complete Office 365 and Office 2019 com setup Guide.]
Control Systems Variations and Differences
Control systems are quite a bit different from general-purpose computer systems in several ways.
These systems differences, in turn, make a big difference in how they must be properly controlled and audited and, in many cases, make it impossible to do a proper audit on the live system. Some of the key differences to consider include, without limit, the following:
They are usually real-time systems. Denial of services or communications for periods of thousandths of a second or less can sometimes cause catastrophic failure of physical systems, which in turn can sometimes cause other systems to fail in a cascading manner.
This means that the real-time performance of all necessary functions within the operating environment must be designed and verified to ensure that such failures will not happen.
It also means that they must not be disrupted or interfered with except in well-controlled ways during testing or audits. It also means that they should be as independent as possible of external systems and influences.
They tend to operate at a very low level of interaction, exchanging data like register settings and histories of data values that reflect the state or rate of change of physical devices such as actuators or sensors.
That means that any of the valid values for settings might be reasonable depending on the overall situation of the plant they operate within and that it is hard to tell whether a data value is valid without a model of the plant in operation to compare the value to.
They tend to operate in place for tens of years before being replaced and they tend to exist as they were originally implemented. They do not get updated very often, do not run antivirus scanners, and, in many cases, do not even have general-purpose operating systems.
This means that the technology of 30 years ago has to be integrated into new technologies and that designers have to consider the implications over that time frame to be prudent. Initial cost is far less important than life-cycle costs and consequences of failure tend to far outweigh any of the system costs.
Most of these systems are designed to operate in a closed environment with no connection outside of the control environment. However, they are increasingly being connected to the Internet, wireless access mechanisms, and other remote and distant mechanisms running over intervening infrastructure.
Such connections are extremely dangerous, and commonly used protective mechanisms like firewalls and proxy servers are rarely effective in protecting control systems to the level of surety appropriate to the consequences of failure.
Current intrusion and anomaly detection systems largely fail to understand the protocols that control systems use and, even if they did, do not have plant models that allow them to differentiate between legitimate and illegitimate commands in context.
Even if they could do this, the response times for control systems is often too short to allow any such intervention, and stopping the flow of control signals is sometimes more dangerous than allowing potentially wrong signals to flow.
Control systems typically have no audit trails of commands executed or sent to them; have no identification, authentication, or authorization mechanisms; and execute whatever command is sent to them immediately unless it has a bad format.
They have only limited error detection capabilities, and in most cases, erroneous values are reflected in physical events in the mechanisms under control rather than error returns.
When penetration testing is undertaken, it very often demonstrates that these systems are highly susceptible to attack.
However, this is quite dangerous because as soon as a wrong command is sent to such a system or the system slows down during such a test, the risk is run of doing catastrophic damage to the plant. For that reason, actual systems in operation are virtually never tested and should not be tested in this manner.
In control systems, integrity, availability, and use control are the most important objectives for operational needs, while accountability is vital to forensic analysis, but confidentiality is rarely of import from an operational standpoint at the level of individual control mechanisms. The design and review process should be clear in its prioritization.
This is not to say that confidentiality is not important. In fact, there are examples such as reflexive control attacks and gaming attacks against the financial system in which control system data have been exploited;
but given the option of having the system operate safely or leaking information about its state, the safe operation should be given precedence.
Questions to Probe
Finally, while each specific control system has to be individually considered in context, there are some basic questions that should be asked with regard to any control system and a set of issues to be considered relative to those questions.
Question 1: What Is the Consequence of Failure and Who Accepts the Risk?
The first question that should always be asked with regard to control systems is the consequences associated with control system failures, followed by the surety level applied to implement and protect those control systems. If the consequences are higher, then the surety of the implementation should be higher.
The consequence levels associated with the worst-case failure, ignoring protective measures in place, indicate the level at which risks have to be reviewed and accepted.
If lives are at stake, likely the chief executive officer (CEO) has to accept residual risks. If significant impacts on the valuation of the enterprise are possible, the CEO and chief finance officer (CFO) have to sign off.
In most manufacturing, chemical processing, energy, environment, and other similar operations, the consequences of a control system failure are high enough to require top management involvement and sign-off.
Executives must read the audit summaries and the chief scientist of the enterprise should understand the risks and describe these to the CEO and CFO before sign-off.
If this is not done, who is making these decisions should be determined and an audit team should report this result to the board as a high priority item to be mitigated?
Question 2: What Are the Duties to Protect?
Along with the responsibility for control systems comes civil and possibly criminal liability for failure to do the job well enough and for the decision to accept a risk rather than mitigate it.
In most cases, such systems end up being safety systems, having potential environmental impacts, and possibly endangering surrounding populations.
Duties to protect include, without limit, legal and regulatory mandates, industry-specific standards, contractual obligations, company policies, and possibly other duties.
All of these duties must be identified and met for control systems, and for most high-valued control systems, there are additional mandates and special requirements.
For example, in the automotive industry, safety mechanisms in cars that are not properly operating because of a control system failure in the manufacturing process might produce massive recalls, and there may be a duty to have records of inspections associated with the requirements for recalls that are unmet within some control systems.
Of course, designers should know the industry they operate in, as auditors, and without such knowledge, items such as these may be missed.
Question 3: What Controls Are Needed, and Are They in Place?
Control systems in use today were largely created at a time when the Internet was not widely connected. As a result, they were designed to operate in an environment where connectivity was very limited.
To the extent that they have remote control mechanisms, those mechanisms are usually direct command interfaces to control settings.
At the time they were designed, the systems were protected by limiting physical access to equipment and limiting remote access to dedicated telephone lines or wires that run with the infra-structure elements under control.
When this is changed to a nondedicated circuit, when the telephone switching system no longer uses physical controls over dedicated lines, when the telephone link is connected via a modem to a computer network connected to the Internet;
or when a direct IP connection to the device is added, the design assumptions of isolation that made the system relatively safe are no longer valid.
When connecting these systems to the Internet, such connections are typically made without the necessary knowledge to do them safely. Given the lack of clarity in this area, it is probably important to not make such connections without having the best experts to consider the safety of those changes.
This sort of technology change is one of the key things that make control systems susceptible to attack, and most of the technology fixes put in place with the idea of compensating for those changes do not make those systems safe. Here are some examples of things we have consistently seen in reviews of such systems:
The claim of an “air gap” or “direct line” or “dedicated line” between a communications network used to control distant systems and the rest of the telephone network is almost never true, no matter how many people may claim it.
The only way to verify this is to walk from place to place and follow the actual wires, and every time we have done it, we have found these claims to be untrue.
The claim that “nobody could ever figure that out” seems to be a universal form of denial. Unfortunately, people do figure these things out and exploit them all the time, and of course, our teams have figured them out to present them to the people who operate the control systems, demonstrating that they can be figured out.
Remote control mechanisms are almost always vulnerable, less so between the SCADA and the things it controls when the connections are fairly direct.
But almost always for mobile control devices, any mechanisms using wireless, any system with unprotected wiring, any system with a way to check on or manage from afar, and anything connected either directly or indirectly to the Internet.
Encryption, VPN mechanisms, firewalls, intrusion detection sensors, and other similar security mechanisms designed to protect normal networks from standard attacks are rarely effective in protecting control systems connected to or through these devices from attacks that they face.
And many of these techniques are too slow, cause delays, or are otherwise problematic for control systems. Failures may not appear during testing or for years, but when they do appear, they can be catastrophic.
Insider threats are almost always ignored, and typical control systems are powerless against them. However, many of the attack mechanisms depend on a multistep process that starts with changing a limiter setting and is followed by exceeding the normal limits of operation.
If the detection of these limit-setting changes were done in a timely fashion, many of the resulting failures could be avoided.
Change management in control systems is often not able to differentiate between safety interlocks and operational control settings. Higher standards of care should be applied to changes of interlocks than changes in data values because the interlocks are the things that force the data values to within reasonable ranges.
As an example, interlocks are often bypassed by maintenance processes and sometimes not verified after the maintenance is completed.
The standard operating procedure should mandate safety checks including verification of all interlocks and limiters against known good values and external review should keep old copies and verify changes against them.
If accountability is to be attained, it must be done by an additional audit device that receives signals through a diode or similar mechanism that prevents the audit mechanism from affecting the system.
This device must itself be well protected to keep forensically sound information required for investigation. However, since there is usually poor or no identification, authentication, or authorization mechanism within the control system itself, attribution is problematic unless explicitly designed into the overall control system.
Alarms should be in place to detect loss of accountability information, and such loss should be immediately investigated. A proper audit system should be able to collect all of the control signals in a complex control environment for periods of many years without running out of space or becoming overwhelmed.
If information from the control system is needed for some other purpose, it should run through a digital diode for use.
If the remote control is really needed, that control should be severely limited and implemented only through a custom interface using a finite state machine mechanism with syntax checks in context, strict accountability, strong auditing, and specially designed controls for the specific controls on the specific systems.
It should fail into a safe mode and be carefully reviewed and should not allow any safety interlocks or other similar changes to be made from afar.
To the extent that distant communication is used, it should be encrypted at the line level where feasible; however, because of timing constraints, this may be of only limited value.
To the extent that remote control is used at the level of human controls, all traffic should be encrypted and the remote control devices should be protected to the same level of surety as local control devices.
That means, for example, that if you are using a laptop to remotely control such a mechanism, it should not be used for other purposes, such as e-mail, Web browsing, or any other nonessential function of the control system.
Nothing should ever be run on a control system other than the control system itself. It needs to have dedicated hardware, infrastructure, connectivity, bandwidth, controls, and so forth. The corporate LAN should not be shared with the control system, no matter how much there are supposed to be guarantees of quality of service.
If voice over IP replaces plain old telephone service (POTS) throughout the enterprise, make sure it is not replaced in the control systems. Fight the temptation to share an Ethernet between more than two devices, to go through a switch or other similar device, or to use wireless, unless there is no other way.
Just remember that the entire chain of control for all of these infrastructure elements may cause the control system to fail and induce the worst case consequences.
Finally, experience shows that people believe a lot of things that are not true. This is more so in the security arena than in most other fields and more critical in control systems than in most other enterprise systems. When in doubt, do not believe them. Trust, but verify.
Perhaps more dangerous than older systems that we know have no built-in controls are modern systems that run complex operating systems and are regularly updated. Modern operating platforms that run control systems often slow down when updates are underway or at different times of the day or during different processes.
These slowdowns sometimes cause control systems to slow unnecessarily. If an antivirus update causes a critical piece of software to be detected in a false-positive, the control system could crash, and if a virus can enter the control system, the control system is not secure enough to handle medium- or high-consequence control functions.
Many modern systems have built-in security mechanisms that are supposed to protect them, but the protection is usually not designed to ensure availability, integrity, and use control, but rather to protect confidentiality. As such, they aim at the wrong target, and even if they should hit what they aim at, it will not meet the need.
Within locking mechanisms, for example, we include a selection of lock types, electrical lock-out controls, mechanical lock-out controls, fluid lock-out controls, and gas lock-out controls, time-based access controls, location-based access controls, event sequence-based access controls, situation-based access controls, lock fail-safe features, lock default settings, and lock tamper-evidence.
Similar sorts of lists exist in other areas. For example, in technical information security, under network firewalls, we list outer router, routing controls, and limitations on ports, gateway machines, demilitarized zones (DMZs), proxies, virtual private networks (VPNs), identity-based access controls, hardware acceleration, appliance or hardware devices, inbound filtering, and outbound filtering.
Each of these has variations as well. Under Operations Security, which is essentially a process methodology with some technologies in all areas of security that support it;
we list time frame of operation, the scope of operation, threats to the operation, secrets that must be protected, indicators of those secrets, capabilities of the threats, intents of the threats, observable indicators present, vulnerabilities, the seriousness of the risk, and countermeasures identified and applied.
In the analysis of intelligence indicators, we typically carry out or estimate the effects of these activities that are common to many threats:
Review widely available literature;
Send intelligence operatives into adversary countries, businesses, or facilities;
Plant surveillance devices (bugs) in computers, buildings, cars, offices, and elsewhere;
Take inside and outside pictures on building tours;
Send e-mails in to ask questions;
Call telephone numbers to determine who works where, and to get other related information;
Look for or build up a telephone directory;
Build an organizational chart;
Cull through thousands of Internet postings;
Do Google and other similar searches;
Target individuals for elicitation;
Track the movement of people and things;
Track customers, suppliers, consultants, vendors, service contracts, and other business relationships;
Do credit checks on individual targets of interest;
Use commercial databases to get background information;
Access history of individuals including airline reservations and when they go where;
Research businesses people have worked for and people they know;
Find out where they went to school and chat with friends they knew from way back;
Talk to neighbors, former employers, and bartenders;
Read the annual report; and
Send people in for job interviews, some of whom get jobs.
It rapidly becomes apparent that (1) the number of alternatives is enormous for both malicious attacker and accidental events, (2) the number of options for protection is enormous and many options often have to be applied;
and (3) no individual can attain all of the skills and knowledge required to perform all of the tasks in all of the necessary areas to define and design the protective system of infrastructure.
Even if an individual had all of the requisite knowledge, they could not possibly have the time to carry out the necessary activities for a critical infrastructure of substantial size. Critical infrastructure protection is a team effort requiring a team of experts.
Protection Design Goals and Duties to Protect
In a sense, the goal of protection may be stated as a reduction in negative consequences, but in real systems, more specific goals have to be clarified. There is a need to define the duties to protect if those duties are going to be fulfilled by an organization.
The obvious duty that should be identified by people working on critical infrastructure protection is the duty to prevent serious negative consequences from occurring;
but as obvious as this is, it is often forgotten in favor of some other sort of duty, like making money for the shareholders regardless of the implications to society as a whole.
A structured approach to defining duties to protect uses a hierarchical process starting with the top-level definition of duties associated with laws, owners, directors, auditors, and top management.
Laws and regulations are typically researched by a legal team and defined for internal use. Owners and directors define their requirements through the set of policies and explicit directives.
Auditors are responsible for identifying applicable standards against which verification will be performed and the enterprise measured. Top executives identify day-to-day duties and manage the process.
Duties should be identified through processes put in place by those responsible; however, if this is not done, the protection program should seek out this guidance as one of its duties to be diligent in its efforts.
Identified duties should be codified in writing and be made explicit, but if this is not done by those responsible, it is again incumbent on the protection program to codify them in documentation and properly manage that documentation.
There is often resistance to any process in which those who operate the protection program seek to clarify or formalize enterprise-level decisions.
As an alternative to creating formal documents or forcing the issue unduly, the protection an executive might take the tactic of identifying the duties that are clarified in writing and identifying that no other duties have been stipulated as part of the documentation provided for the design of their protection program.
The operating environment has to be characterized to gain clarity in the context of protection. Just as a bridge designer has to know the loads that are expected for the bridge, the length of the span, the likely range of weather conditions;
and other similar factors to design the bridge properly, the protection designer has to know enough about the operating environment to design the protection system to operate in the anticipated operating conditions.
The specific parameters are highly specific to the infrastructure type and protection area. For example, physical security of long-distance telecommunications lines has different operating environment parameters than do personal security in a mining facility.
Security-related operating environment issues tend to augment normal engineering issues because they include the potential actions of malicious actors in the context of the engineering environment.
While engineers design bridges to handle natural hazards, the protection specialist must find ways to protect those same bridges when they are attacked in an attempt to intentionally push them beyond design specifications.
The protection designer has to understand what the assumptions are and how these can be violated by intentional attackers, and this forms the operating environment of the projection designer.
A systematic approach to design is vital to success in devising protection approaches. Without some sort of method to the madness, the complexity of all of the possible protection designs is instantly overwhelming. There are a variety of design methodologies.
There are many complaints in the literature about the waterfall process in which specifications are developed, designs were undertaken, evaluations of alternatives completed, and selections made, with a loop for feedback into the previous elements of the process.
However, despite the complaints, this process is still commonly embraced by those who are serious about arriving at viable solutions to security design challenges. In fact, this process has been well studied and leads to many positive results, but there are many alternative approaches to protection design.
As an overall approach, one of the more meaningful alternative approaches is to identify the surety level of the desired outcome for the overall system and its component parts. Surety levels can be thought of in fairly simple terms, low, medium, and high, for example.
For low surety, a different process is undertaken because the consequences are too low to justify serious design effort. For medium consequences, a systematic approach is taken, but not pushed to the limit of human capability for design and analysis.
For high consequences, the most certain techniques available are used and the price is paid regardless of the costs.
Of course, realistic designers know that there is no unlimited cost project, that there are tradeoffs at all levels, and that such selection is only preliminary, and this sort of iterative approach to reducing the space of possibilities helps to focus the design process.
Process, Policy, Management, and Organizational Approaches
This is very similar to other engineering disciplines, and rightly so. Protection system design is an engineering exercise, but it is also a process definition exercise in that along with all of the things that are created.
There are operational procedures and process requirements that allow the components to operate properly together to form the composite. Protection is a process, not a product.
The protection system and the infrastructure as a whole have to function and evolve over time frames, and in the case of the protection system, it has to be able to react in very short time frames as well as adapt in far longer time frames.
As a result, the process definitions and the roles and actions of the parties have to be defined as part of the design process, in much the same way as the control processes of a power station or water system requires that people and process be defined while the plant is designed.
Except that in the case of infrastructures like power plants and water systems, the people in these specialty fields and their management typically already know what to expect. In protection, they do not.
The problem of inadequate knowledge at the management and operational level relating to protection will solve itself over time, but today, it is rather serious.
The technology has changed in recent years, and the changes in the threat environment have produced serious management challenges to Western societies, but in places like the former Soviet Union and in oppressive societies with internal distrust, these systems are well understood and have been in place for a long time.
The challenge is getting a proper mix of serious attention to protection and reasonable levels of trust based on reasonable assumptions.
A management process must be put in place in order to ensure that whatever duties are identified and policies mandated, they are managed so that they get executed, the execution is measured and verified, and failures in execution are mitigated in a timely fashion.
The protection designer must be able to integrate the technical aspects of the protection system into the management aspects of the infrastructure provider to create a viable system that allows the active components of the protection system to operate within specifications or the overall protective system will fail.
This has to take into account the failures in the components of the active system, which include not only technology but also people, business process, management failures, and active attempts to induce failures.
For example, an inadequate training program for incident evaluation will yield responses that cause inadequate resources to be available where and when needed, leading to reflexive control attack weaknesses in the protection system.
These sorts of processes have to be deeply embedded in the management structure of the enterprise to be effective. Otherwise, management decisions about seemingly irrelevant matters will result in successful attacks.
A typical example is a common decision to put content about the infrastructure on the Internet for external use with business partners.
Once the information is on the Internet, it is available on a more or less permanent basis to attackers, many of whom constantly seek out and collect permanent records of all information on potential future targets.
It is common for job descriptions to include details of operating environments in place, which leads attackers to the in-depth internal knowledge of the systems in use.
Because there are a limited number of systems used within many infrastructure industries, a few hints rapidly yield a great deal of knowledge that is exploitable in attacks.
In one case, a listing of vendors was used to identify lock types, and a vulnerability testing group was then able to get copies of the specific lock types in use, practice picking those locks, and bring special pick equipment to the site for attacks.
This reduced the time to penetrate barriers significantly. When combined with a floor plan that was gleaned from public records associated with a recent renovation, the entry and exit plan for covert access to control systems were devised, practiced, and executed.
If management at all levels does not understand these issues and make day-to-day operational decisions with this in mind, the result will be the defeat of protective systems.
The recognition that mistakes will be made is also fundamental to the development of processes. It is not only necessary to devise processes associated with the proper operation of the protective system and all of the related information and systems.
In addition, the processes in place have to deal with compensation for failures in the normal operating modes of these systems so that small failures do not become large failures.
In a mature infrastructure process, there will not be heroic individual efforts necessary for the protective system to work under stress. It will degrade gracefully to the extent feasible given the circumstance, according to the plan in place.
The policy is typically missing or wrong when infrastructure protection work is started, and it is not always fixed when the work is done. It is hard to get top management to make policy changes and all the harder in larger providers.
Policies have to be followed and have legal standing within companies, while other sorts of internal decisions do not have the same standing.
As a result, management is often hesitant to create a policy. In addition, the policy gives leverage to the protection function, which is another reason that the management in place may not want to make such changes.
Since security is usually not treated as a function that operates at top management levels, there is typically nobody at that level to champion the cause of security, and it gets short shrift.
Nevertheless, it is incumbent on the protection architects and designers to find ways to get policies in place that allow leverage to be used to gain and retain an appropriate level of assurance associated with their function.