Dr.MohitBansal Profile Pic
Published Date:26-10-2017
Your Website URL(Optional)
Diagnosing Problems Technical Security The discipline of measurement underpins many of society’s most revered professions. Young physicists, chemists, and engineers are indoctrinated into science at an early age through the “scientific method”—a controlled and well-defined process for exploring and proving theories about the natural world. When the scientist is investigating a phe- nomenon, the method requires the scientist to: Formulate a hypothesis about the phenomenon • Design tests to support or disprove the hypothesis • Rigorously conduct and measure the results of each test • Draw a conclusion based on the evidence • Nearly everyone who has taken high school biology or chemistry has experienced the sci- entific method firsthand. All science lab experiments—such as dissecting a frog, generat- ing “steam” using dry ice, or building a chemical “volcano”—have the same basic steps. In short: write a hypothesis, conduct the experiment, analyze the results, and write up the conclusion. Measurement is core to the scientific method. Without it, experiments cannot be reproduced; without reproduction, a scientist’s analysis and conclusions can- not be trusted. 39 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Beyond the domain of the laboratory, hypothesis testing and rigorous measurement underpin certain disciplines in the commercial world, too. Modern management con- sulting, for example, focuses on marshalling empirical evidence to diagnose organiza- tional problems and to confirm hypothetical strategies for fixing underperforming operations. It is no accident that the entrée engagement for McKinsey & Company, for example, is a short data-gathering and needs-assessment exercise called a diagnostic. The resemblance to the laboratory procedure of the same name is—I am quite sure—strictly intentional. Usually, anyone using “scientific method” and “security” in the same sentence sends his or her listener into fits of giggles. This is largely because little consensus exists on what questions to ask, or how to measure. As my colleague Fred Cohen notes: “The vast majority of information security-related measurement measures what is trivial to measure and not what will help the enterprise make sensible risk manage- 1 ment decisions. Very few people seem to know what they want to measure.” To my eyes, the historical lack of consensus about security measures only means that we have not applied enough thought to the endeavor. Thus, taking our cue from scien- tists and management consultants, the next two chapters describe how to use empirical measures to diagnose issues with an organization’s security controls. In security, metrics help organizations: Understand security risks • Spot emerging problems • Understand weaknesses in their security infrastructures • Measure performance of countermeasure processes • Recommend technology and process improvements • The need for metrics is great, because all companies suffer from security-related aches and pains. Sometimes the pain is sharp and incapacitating, such as when an intruder defaces a public-facing website. Perhaps, as with the now-defunct Egghead Software, an intruder successfully obtains sensitive customer data, and the resulting embarrassment causes business losses. Sharp pains are the kind that put companies “on the front page of the Wall Street Journal,” as the expression goes. Far more common are the dull aches: an unsettling feeling in the CIO’s stomach that something just isn’t right. Regardless of the source of the pain, security metrics can help with the diagnosis. 1 Fred Cohen, e-mail to mailing list, January 1, 2006. 40 USING METRICS TO DIAGNOSE PROBLEMS: A CASE STUDY To that end, this chapter formally defines a collection of common security metrics for diagnosing problems and measuring technical security activities. I have grouped them into four categories: perimeter defenses, coverage and control, availability/reliability, and applications. Chapter 4, “Measuring Program Effectiveness,” discusses metrics for meas- uring ongoing effectiveness: risk management, policies compliance, employee training, identity management, and security program management. Chapter 8, “Designing Security Scorecards,” uses both sets of metrics to build a “balanced security scorecard.” But first, a short case study will describe in more detail exactly what I mean by “diag- nostic” metrics. USING METRICS TO DIAGNOSE PROBLEMS:A CASE STUDY A few years ago my former employer was called in by the CTO of a large, well-known maker of high-end consumer electronics. This company, which prides itself on its pro- gressive approach to IT management, operates a large, reasonably up-to-date network 2 and a full suite of enterprise applications. The CTO, Barry Eiger, an extremely smart man, is fully conversant in the prevailing technology trends of the day. In manner and in practice, he tends to be a conservative technology deployer. Unimpressed with fads and trends, he prefers to hydrofoil above the choppy technological seas with a slightly bemused sense of detachment. Facts, rather than the ebbs and flows of technology, weigh heavily in his decision-making. In our initial conversations, he displayed an acute aware- ness of industry IT spending benchmarks. We discovered later that he had spent signifi- cant sums of money over the years on advisory services from Gartner Group, Meta Group, and others. If he is so well informed, why did he call us in I wondered? Barry’s problem was sim- ple. His firm had historically been an engineering-driven company with limited need for Internet applications. More recently, his senior management team had asked him to deploy a series of transactional financial systems that would offer customers order man- agement, loan financing, and customer support services. These public-facing systems, in turn, connected back to several internal manufacturing applications as well as to the usual suspects—PeopleSoft, SAP, Siebel, and Oracle. A prudent man, Barry wanted to make sure his perimeter and application defenses were sufficient before beginning significant deployments. He wanted to know how difficult it might be for an outsider to penetrate his security perimeter and access sensitive customer data, product develop- ment plans, or financial systems. 2 Pseudonym. 41 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Barry asserted that his team had done a good job with security in the past. “What if you can’t get in?” he asked rhetorically. Despite his confidence, his dull ache persisted. His nagging feeling compelled him to find out how good his defenses really were. He also wanted to get some benchmarks to see how well his company compared to other compa- nies like his. Barry wanted a McKinsey-style “diagnostic.” This kind of diagnostic first states an overall hypothesis related to the business problem at hand and then marshals evidence (metrics) that supports or undermines the theory. The essence of the McKinsey diagnostic method is quite simple: The analysis team identifies an overall hypothesis to be supported. Example: “The • firm is secure from wireless threats by outsiders.” The team brainstorms additional subhypotheses that must hold for the overall • hypothesis to be true. For example, to support the wireless hypothesis we just iden- tified, we might pose these subhypotheses: “Open wireless access points are not accessible from outside the building” and “Wireless access points on the corporate LAN require session encryption and reliable user authentication.” The team examines each subhypothesis to determine if it can be supported or dis- • proved by measuring something. If it cannot, the hypothesis is either discarded or decomposed into lower-level hypotheses. For each lowest-level hypothesis, the team identifies specific diagnostic questions. • The answers to the questions provide evidence for or against the hypothesis. Diagnostic questions generally take the form of “The number of X is greater (or less) than Y” or “The percentage of X is greater (or less) than Y.” For example, “There are no open wireless access points that can be accessed from the building’s parking lot or surrounding areas” or “100% of the wireless access points on the corporate LAN require 128-bit WPA security.” The diagnostic questions dictate our metrics. The primary benefit of the diagnostic method is that hypotheses are proven or disproven based on empirical evidence rather than intuition. Because each hypothesis supports the other, the cumulative weight of cold, hard facts builds a supporting case that cannot be disputed. A secondary benefit of the diagnostic method is that it forces the analysis team to focus only on measurements that directly support or disprove the overall hypothesis. Extraneous “fishing expeditions” about theoretical issues that cannot be measured auto- matically filter themselves out. So far, the sample hypotheses and diagnostic questions I have given are rather simplis- tic. Why don’t we return to our friend Barry’s company for a real-world example? Recall that Barry’s original question was “Is my company’s customer data secure from outside attack?” Our overall hypothesis held that, indeed, the company was highly 42 USING METRICS TO DIAGNOSE PROBLEMS: A CASE STUDY vulnerable to attack from outsiders. To show that this statement was true (or untrue), we constructed subhypotheses that could be supported or disproven by asking specific ques- tions whose answers could be measured precisely and empirically. Table 3-1 shows a sub- set of the diagnostics we employed to test the hypothesis. Note that these diagnostics do not exhaust the potential problem space. Time and budget impose natural limits on the number and kind of diagnostics that can be employed. Table 3-1 Diagnostic Metrics in Action Subhypotheses Diagnostic Questions The network perimeter is porous, permitting How many sites are connected directly to the core easy access to any outsider. network without intermediate firewalls? How many of these sites have deployed unsecured wireless networks? Starting with zero knowledge, how many minutes are required to gain full access to network domain con- trollers? An outsider can readily obtain access to internal What percentage of user accounts could be systems because password policies are weak. compromised in 15 minutes or less? Once on the network, attackers can easily obtain How many administrative-level passwords could be administrator credentials. compromised in the same time frame? An intruder finding a hole somewhere in the How many internal “zones” exist to compartmentalize network could easily jump straight to the core users, workgroup servers, transactional systems, transactional systems. partner systems, retail stores, and Internet-facing servers? Workstations are at risk for virus or worm attacks. How many missing operating system patches are on each system? Viruses and worms can spread to large numbers How many network ports are open on each of computers quickly. workstation computer? How many of these are “risky” ports? Application security is weak and relies too How many security defects exist in each business heavily on the “out of the box” defaults. application? What is the relative “risk score” of each application compared to the others? continues 43 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Table 3-1 Diagnostic Metrics in Action (Continued) Subhypotheses Diagnostic Questions The firm’s deployments of applications are Where does each application rank relative to other much riskier than those made by leaders in the enterprise applications stake has examined for other field (for example, investment banking). clients? To answer the diagnostic questions we posed, we devised a four-month program for Barry’s company. We assessed their network perimeter defenses, internal networks, top ten most significant application systems, and related infrastructure. When we finished the engagement and prepared our final presentation for Barry, his team, and the com- pany’s management, the metrics we calculated played a key role in proving our hypothe- sis. The evidence was so compelling, in fact, that the initial engagement was extended into a much longer corrective program with a contract value of several million dollars. The preceding story illustrates the role that metrics can play in diagnosing problems. The remainder of this chapter describes, in a more systematic way, the types of metrics that can be useful for diagnostics. DEFINING DIAGNOSTIC METRICS This chapter defines about seventy-five different metrics that organizations use to assess their security posture, diagnose issues, and measure security activities associated with their infrastructure. Because the list of potential metrics is long indeed, I have split the discussion into two chapters. This chapter focuses on technical metrics that quantify each of the following: Perimeter defenses: To help understand the risk of security incidents coming from • the outside, organizations measure the effectiveness of their antivirus software, anti- spam systems, firewalls, and intrusion detection systems (IDSs). Coverage and control: Companies that run “tight ships” know how important it is • to extend the reach of their control systems as widely as possible. Metrics can help us understand the extent and effectiveness of configuration, patching, and vulnerability management systems. Availability and reliability: Systems that companies rely on to generate revenue • must stay up, without being taken out of service due to unexpected security inci- dents. Metrics like mean time to recover (MTTR) and uptime percentages show the dependencies between security and profits. 44 DEFINING DIAGNOSTIC METRICS Application risks: Custom and packaged line-of-business applications that • enterprises depend on need to be developed in a safe manner that does not result in unnecessary security exposures. Application security metrics such as defect counts, cyclomatic complexity, and application risk indices help quantify the risks inherent in homegrown code and third-party software. Each of the four metrics sections is largely self-contained and follows the same formula. First the metrics subject area is defined, explaining exactly what I mean by, say, “perimeter security and threats.” That is followed by a table containing a list of representative metrics for the subject area. Each metric has a purpose and a representative list of typical sources. Note that for the sources, generally I have listed the system originating the metric, not necessarily the one that calculates the metric. In many cases, the originator passes its data to a downstream system for further processing—for example, to a SIM/SEM system. Many of the metrics may not always seem intuitively obvious; thus, after the table I explain selected metrics in more detail: the value they bring, who uses them, why, and whether they possess special characteristics you should be aware of. But most of the met- rics should be fairly self-explanatory. If you are reading this as a member of a company or organization looking to imple- ment a metrics program, I hope that you will find the metrics presented in this chapter (and in the next one) helpful in your endeavors. But before getting into the details, please be aware of three caveats. First, the metrics I discuss here should not be considered the last word. A large num- ber of organizations and industry initiatives have begun creating metrics lists, notably the Corporate Information Security Working Group (CISWG), NIST, ISSEA, US CERT, and my own initiative, Second, these metrics are mostly observed rather than modeled. I have derived the metrics herein from multiple sources: interviews with enterprise subject matter experts, publicly available documents, and personal experience. They are, in most cases, things that people count, rather than things a risk model says they should count. I do not offer a risk model that justifies the selection of particular metrics. In other words, you can use the metrics in this chapter to support or disprove your own hypotheses in targeted areas, but the collective set does not itself imply a grand hypothesis for the overall information security problem space. Third, not all of these metrics are appropriate for all organizations—the list is not meant to be canonical. When you select metrics for your own use, each must pass the “So what?” test. This means that a particular metric needs to provide insights that you don’t already possess, arm you with information you can use to spend your organiza- tion’s dollars more wisely, or help you diagnose problems better. More important, the 45 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY metrics you select should mean something to the people responsible for producing them or to their bosses. If the metric fails the test, do not use it, or pick another one that works better. PERIMETER SECURITY AND THREATS The classical conception of information security begins with the firewall, starting with the DEC SEAL product and the TIS Firewall Toolkit. In 1993, when it was considered cutting-edge to have an Internet connection—before companies like McAfee, PatchLink, and ArcSight persuaded corporate information security officers to write them into their budgets—most organizations instinctively knew that it was a good idea to buy an ANS InterLock or Raptor firewall. Desktop viruses were rare and spread slowly because floppy disk sharing—not e-mail or the web browser—was the primary propagation vector. Many companies’ conceptions of what it meant to be secure focused on managing access to the Internet via the firewall. In the early days, the telecom group typically man- aged the firewall in conjunction with the centralized wide-area networking (WAN) group—never to be confused with (shudder) the “desktop” networking group. It made sense to have centralized, specialized perimeter security organizations manage central- ized, specialized perimeter security products. A few years later, companies began granting wide access to Internet resources. By 1997, nearly every employee had on his or her desktop a web browser, an e-mail client, and a monumentally insecure Windows operating system. The bad guys figured out this latter feature soon enough, and by 2000, Internet- and e-mail–borne viruses and worms had become the bane of IT departments everywhere. As a result, antivirus software became a standard corporate budget item. Research from the company I work for, for example, shows that in 2005, 99% of enterprises had 3 deployed antivirus software company-wide. Continuing worries about Internet-based malware threats dominate the pages of most of the IT and trade publications, and a whole cottage industry has sprung up around them: antispyware software, vulnerability scanning tools, patch management software, and related threat and perimeter-oriented security products. Because of their history of centralization within organizations, long tenure on corpo- rate budgets, and relative maturity of tools, metrics for perimeter security are arguably the best understood of all security measures. To put it more simply, IT groups have been buying firewalls and antivirus software for a long time. It is not surprising that compa- nies think of these products first when thinking about security metrics. 3 A. Jaquith, C. Liebert, et al., Yankee Group Security Leaders and Laggards survey, 2005. 46 PERIMETER SECURITY AND THREATS Table 3-2 shows a representative list of perimeter defense metrics. Most of these met- rics should be familiar to most security professionals. Table 3-2 Perimeter Defense Metrics Metric (Unit of Measure) Purpose Sources E-mail Messages per day (number ) Velocity of legitimate e-mail E-mail system Per organizational unit traffic; establishes baselines • Spam detected/filtered Indicator of e-mail “pollution” Gateway e-mail content (, percent %) filtering software Spam not detected/missed (, %) Effectiveness of content Gateway e-mail content filtering software filtering software Spam false positives (, %) Effectiveness of content Gateway e-mail content filtering software filtering software Spam detection failure rate (%)— Effectiveness of content Gateway e-mail content not-detected plus false positives, filtering software filtering software divided by spam detected Viruses and spyware detected Indicator of e-mail “pollution” Gateway e-mail content in e-mail messages (, %) filtering software Workgroup e-mail content filtering software Antivirus and Antispyware Viruses and spyware detected on Propensity of users to surf to sites Perimeter web filtering appliance websites (, %) containing web-based threats or software Spyware detected in user files () Indicator of infection rate on Desktop antispyware desktops and servers On servers • On desktops • On laptops • Viruses detected in user files () Infection rate of endpoints Desktop antivirus as determined by automated On servers • software scans On desktops • On laptops • continues 47 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Table 3-2 Perimeter Defense Metrics (Continued) Metric (Unit of Measure) Purpose Sources Virus and incidents requiring Shows relative level of manual Antivirus software manual cleanup (, % of overall effort required to clean up Trouble-ticketing system virus incidents) Manual data sources Spyware incidents cleanup cost Shows labor costs associated Antivirus software with cleanup By business unit Trouble-ticketing system • Manual data sources Virus incidents cleanup cost Shows labor costs associated with Antivirus software cleanup By business unit Trouble-ticketing system • Manual data sources Outgoing viruses and spyware Indicator of internal infections Gateway e-mail content filtering caught at gateway () software Firewall and Network Perimeter Firewall rule changes () Suggests level of security Firewall management system complexity required by each By business unit Time-tracking and charge-back • systems By group’s server type • Firewall labor Labor required to support HR management system ( full-time equivalents) business unit firewall needs Manual data source Inbound connections/sessions to Absolute level of inbound Firewall management system Internet-facing servers () Internet activity By TCP/UDP port • By server type or group • Sites with open wireless access Suggests potential exposure to Wireless scanning tools points () infiltration by outsiders (NetStumbler, AirSnort, and so on) Remote locations connected Indicates level of compartmental- Network mapping software directly to core transaction and ization of sensitive business assets, Network diagrams financial systems without inter- and potential exposure to attack mediate firewalls () 48 PERIMETER SECURITY AND THREATS Metric (Unit of Measure) Purpose Sources Attacks Ratio of Internet web sessions to Shows the attack “funnel” by IDS attackers (%) at three levels of event which low-level security events Firewall severity: are triaged and escalated, as Trouble-ticketing system compared to the overall level Prospects (initial IDS events) • Manual data sources of business Suspects (machine-filtered/ • escalated alerts) Attackers (manual • investigation by staff) Number of attacks () Absolute number of detected IDS attacks, both thwarted and Manual data sources successful Number of successful attacks (, %) Indicates the relative effectiveness IDS of perimeter defenses By affected business unit Manual data sources • By geography • E-MAIL As you read through the “E-mail” section of Table 3-2, you will recognize some familiar metrics. A chestnut of e-mail security vendors is the classic set of spam and gateway antivirus metrics: percentage of spam detected/filtered, and the number of viruses and spyware detected in e-mail messages. Research by the Robert Frances Group shows that 4 77% of organizations track the first metric, and 92% track the second. But it is dangerous to read too much into these. For example, the percentage of e- mails that are spam is often paraded around as evidence that the spam-control software is doing its job. But that metric really does not tell us much about the software’s accu- racy—only about the overall level of “pollution” in the e-mail stream. In other words, it is an environmental indicator but not necessarily a measure of effectiveness. A better measure of effectiveness is either the percentage of missed spam (as reported by end users) or the percentage of false positives (that is, the messages marked as spam that were not actually spam). A minority of companies watches these metrics—39% and 31%, 5 respectively. Both of these measures can be used together, and the two together comprise what is called the “spam detection miss rate” metric. 4 C. Robinson, “Collecting Effective Security Metrics” CSO Online, (2004) analyst/report2412.html. 5 Ibid 49 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY ANTIVIRUS AND ANTISPAM Under the category of antivirus and antispyware metrics are the usual “fun facts”: the number of distinct pieces of malware detected by antimalware software scans. These data are easily gathered from desktop and server antimalware systems. But with a little “enrichment” from manual data sources and trouble-ticketing systems, we can add more context. For example, the “virus incidents requiring manual cleanup” metric tells us which virus outbreaks were bad enough that automated quarantine-and-removal processes could not contain them. Dividing the number of incidents that required human intervention into the total number of incidents gives us a much more honest assessment of the effectiveness of the antivirus system. Labor costs associated with man- ual cleanup efforts also can give an organization a sense of where its break/fix dollars are going. Another twist I have added to the traditional antivirus statistics is a simple metric documenting the number of outbound viruses or spyware samples caught by the perime- ter mail gateway’s content filtering software. Why it matters is simple—it is an excellent indicator of how “clean” the internal network is. Organizations that practice good hygiene don’t infect their neighbors and business partners. My friend Dan Geer relates this quote from the CSO of a Wall Street investment bank: “Last year we stopped 70,000 inbound viruses, but I am prouder of having stopped 500 outbound.” In other words, the bank’s internal network is cleaner than the outside environment by a factor of 140 to 1. FIREWALL AND NETWORK PERIMETER Let us move on to firewall and network perimeter metrics for a moment. Recall that fire- walls are rarely, in and of themselves, deterrents to attacks at the business or application layer. That said, they do serve an essential function by keeping unwanted Internet traffic away from protected network assets such as application servers. The converse is also true: firewalls also let traffic in. In many corporations, the firewall rules can be extremely com- plex as a result of continuous, “organic” growth in the number of access requests from business units. One aerospace company I am familiar with, for example, has over 50,000 active firewall rules—and a less-than-clear understanding of exactly which business units all those rules are for. Rhonda McLean, CEO of MacLean Risk Partners and former CSO of Bank of America, turned firewall rule management into a creative set of metrics, which I have 50 PERIMETER SECURITY AND THREATS partially reprinted in Table 3-2 as the “firewall rule changes” and “firewall labor” metrics. Rhonda’s team counted the number and cost of changes in absolute terms to provide a view of the level of effort required to respond to new business requirements. They also broke down these numbers by business unit to encourage accountability and to justify charge-backs for services rendered. In the “Firewall and Network Perimeter” section of the table, you will note two other metrics. The first one counts the number of open wireless access points for an organiza- tion’s remote office. By “open” we mean not requiring a WEP, WPA, or RADIUS pass- word, and without restricting access by means such as MAC address filtering. This isn’t necessarily the most critical metric for every organization. That said, open access points can present a security risk for firms with many far-flung offices in urban environ- ments—especially when considered in combination with the other metric—namely, the number of remote offices connected directly to core transaction networks. One electron- ics manufacturer I know, for example, found multiple open wireless access points in sev- eral overseas locations. In several cases, these locations were in dense urban neighborhoods—anybody with a laptop could obtain an IP address and sniff around the internal network. Even worse, the company had no concept of network zoning; it did not place any firewalls between the remote locations and its core enterprise resource plan- ning (ERP), financial, customer relationship management (CRM), and order manage- ment systems. ATTACKS Quantifying security “attacks” is a difficult task, but it is getting easier thanks to continu- ing improvements to the accuracy of intrusion detection software and, in particular, 6 SEIM software. Security vendors like ArcSight, IBM (Micromuse), and NetForensics attempt to identify attacks by filtering security information into three levels of criticality. The lowest level, security events, feed into the SEIM from source systems. These events are processed by the SEIM and are not necessarily intended to be viewed by humans. If certain types of events correlate strongly, the system generates an alert and forwards it to a security dashboard, along with supporting data. If the incident response team feels that the alert represents an actual attack, they create an incident. Naturally, we can and should count all of these items, and many do. About 85% of 7 organizations count incidents, and over half (54%) also count successful attacks. These 6 SIM/SEM. 7 C. Robinson, “Collecting Effective Security Metrics” CSO Online, (2004) analyst/report2412.html. 51 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY statistics are certainly interesting in and of themselves, but they are also interesting in relation to each other. When the corpus of event and incident data can be scoped down to a well-understood and well-defined group of assets—such as public web servers—we can use these numbers to create a “funnel” that shows the ratio of Internet web sessions to 8 prospective attackers, suspected attackers, and actual (manually investigated) attackers. A bank I visited in 2001, for example, charted these ratios regularly; during my visit, I noted that the ratio of valid web sessions to attackers was 500,000 to 1. You’ll note that the “Attacks” section of Table 3-2 leaves out such common statistics as the most commonly attacked ports and the most “dangerous” external URLs. I have omitted them deliberately, because they don’t pass the “So what?” test. Reed Harrison, CTO of E-Security (now part of Novell), explains: “The typical ‘Top 10 Ports,’ ‘Top 10 Attacking IP Addresses,’ and ‘Top 10 URLs’ are really just watch lists. Our customers don’t consider them compelling metrics; they pre- fer to measure operational efficiency and the effectiveness of their control environ- 9 ment.” These are not even “so-what” metrics; they are just plain useless. COVERAGE AND CONTROL Coverage and control metrics characterize how successful an organization is at extending the reach of its security régime. Most security programs are full of good intentions, usu- ally expressed formally through some sort of policy. But the reality of suboptimal imple- mentation and poor end-user compliance often puts the shaft to all those good intentions. Coverage and control metrics, then, are essential to helping managers under- stand the size of the gap between intentions and facts on the ground. Let’s define the terms a bit more precisely. By coverage we mean the degree to which a particular security control has been applied to the target population of all resources that could benefit from that control. Coverage metrics measure the security organization’s ability to execute on its mandates. Are its eyes, as the saying goes, bigger than its prover- bial stomach, or can the organization meet its coverage goals? Mark Kadrich, a manager formerly with Sygate, explains why achieving good coverage is essential to running an effective security program: 8 In Table 3-2, I use the terms prospects, suspects, and attackers instead of events, alerts, and incidents. 9 Interview with author, April 2006. 52 COVERAGE AND CONTROL “We found that the dark matter on our network had a higher percentage of vulnerabilities when they were finally identified. Remotely managed systems could be systems that are controlled by patch management systems or centrally managed AV systems. You can track that metric over time and create trends that you can correlate 10 to other events.” That said, perfect coverage is impossible. The eligible population may not represent the full set of resources that could be covered. Often, security organizations must grant dispensations or exceptions to embedded, turnkey, or managed systems. Olivier Caleff, who spent several years working for a major defense contractor, explains why: “With turnkey dedicated servers supplied by a vendor, the security group cannot do anything to the system because the vendor will no longer support you otherwise. You have to live with these vulnerable systems and put compensating controls in place to prevent prob- lems from spreading.” Fragile legacy systems might also be excluded: “I have a customer who still runs PCs with OS/2 with his own applications. We allow them to connect to IT and production networks, but only with very strict security and very specific network fil- ters.” In short, systems with dispensations cannot be included in coverage metrics. Control means the degree to which a control is being applied in a manner consistent with the security organization’s service standards, across the scope of covered resources. In other words, for the things we’ve got covered, are we getting the results we want? Table 3-3 shows a sample set of recommended coverage and control metrics. For the purposes of this chapter, I have limited the list of metrics to those that an organization might reasonably measure with its technical infrastructure. Thus, Table 3-3 includes topics such as antivirus, patch management, host configuration, and vulnerability management. Table 3-3 Coverage and Control Metrics Metric Purpose Sources Antivirus and Antispyware Workstations, laptops covered by Extent of antivirus controls, Antivirus software antivirus software (number , for eligible hosts Network management system percent %) Workstations, laptops covered Extent of antispyware controls, Antispyware software by antispyware software (, %) for eligible hosts Network management system continues 10 Mark Kadrich, e-mail to mailing list, March 20, 2006. 53 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Table 3-3 Coverage and Control Metrics (Continued) Metric Purpose Sources Servers covered by antivirus Extent of antivirus controls, Antivirus software software (, %) for eligible hosts Network management system Workstations, laptops, servers Service level agreement (SLA) Antivirus software with current antivirus signatures attainment of antivirus (, %) Workstations, laptops, servers SLA attainment of antispyware Antispyware software with current antispyware signatures (, %) Patch Management Hosts not to policy patch level (%) Identification of gaps in patch Patch management software management process For workstations Vulnerability management system • For servers Systems management software • For laptops • For critical systems • By OS: Windows, Linux, • UNIX, Mac By business unit • By geography • Number of patches applied per Identifies cumulative workload Patch management software period (, per node) for previous reporting periods Unapplied patches Identifies current workload of Patch management software (mean and median per node) For critical patches • By business unit • By geography • Unapplied patch ratio Identifies relative patch workload Patch management software 11 (patches per host) per host 11 P. Mell, T. Bergeron, and D. Henning, “Creating a Patch and Vulnerability Management Program,” NIST Special Publication 800-40, November 2005, SP800-40v2.pdf. 54 COVERAGE AND CONTROL Metric Purpose Sources Unapplied patch latency Shows potential size of window of Patch management software 12 (age of missing patch, per node) vulnerability for missing patches For critical patches • For noncritical patches • By business unit • By geography • Patch testing cycle time (time) Measures time of exposure due to Patch management software elapsed time between release of For critical patches • official patch and time of For servers versus • completion of patch testing workstations For noncritical patches • Patch distribution cycle (time) Measures time of apply patches Patch management software For critical patches • Patches applied outside of Indicates whether control Change control software maintenance windows (, %) processes are “panicked” or Manual controls predictable For critical systems • Patch SLA attainment (%) SLA attainment for patch Patch management software management process For all systems Vulnerability management system • For critical systems Systems management software • Trend versus previous month • Cost of patch vulnerability group Total cost of applying a set of Patch management software 13 (cost ) patches, including management Time-tracking software software, hardware, and labor Host Configuration Workstation and laptop Standardized configuration bench- Desktop benchmarking tools benchmark score mark characterizing the degree of (such as CIS) lockdown measures applied to the operating system Workstations, laptops using stan- Conformance of workstations to Desktop management software dard build image (%) an organization’s standardized operating system build image continues 12 NIST refers to this metric as “patch response time.” 13 NIST ibid. 55 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Table 3-3 Coverage and Control Metrics (Continued) Metric Purpose Sources % systems in compliance with Shows conformance against Change control software approved configurations configuration standards, regardless Desktop management software of how the system was built Network services ratio Identification of potential network Port scanning tools 14 (services per host) ingress points on nodes; suggests divergences from standard builds All ports • Unnecessary ports • By system type • Remote endpoint Systems that can be remotely Systems management software manageability (%) administered by security personnel Patch management software and that are subject to antimalware Antivirus/antispyware software and patch management controls Business-critical systems under Identifies the extent of uptime and Security event management active monitoring (%) security monitoring controls system Logging coverage ( of nodes, %) Determines how many hosts for- Systems management software ward system and security events to Syslog server logs a centralized log server SNMP traps NTP server coverage Determines how many hosts syn- Systems management software ( of nodes, %) chronize clocks via a standardized Time server logs time server Emergency configuration response Time to reconfigure a given set of Time-tracking software 15 time (time) nodes in the event of zero-day outbreaks or security incidents By business unit • By geography • By operating system • Vulnerability Management Vulnerability scanner coverage (, Shows the extent of vulnerability Vulnerability management %, frequency) scanning operations as compared software to the total number of IP addresses By business unit • Frequency measures how often By geography • scans are performed By network or subnet • 14 NIST ibid. 15 NIST ibid. 56 COVERAGE AND CONTROL Metric Purpose Sources 16 Vulnerabilities per host () Indicates the relative level of Vulnerability management potential insecurity based on the software Critical vulnerabilities • number of vulnerabilities per host By system type • By asset class • Monthly vulnerability counts () Raw numbers that, over time, Vulnerability management paint a picture of the overall software By criticality • vulnerability workload By business unit • By geography • By system type • Monthly net change (+/–) in Shows that change in workload Vulnerability management vulnerability incidence from month to month software By criticality • Critical assets • Other assets • Vulnerability identification latency Shows the degree of responsive- Vulnerability management (time) ness of the vulnerability triage software process Time-tracking software Time to close open vulnerabilities Characterizes the level of Vulnerability management responsiveness in fixing software For critical assets • important vulnerabilities that Trouble-ticketing software affect critical assets Time to fix 50% of vulnerable Identifies the “half-life” of the Vulnerability management hosts ( days), aka “half-life” window of vulnerability for an software organization’s assets. Measures For critical vulnerabilities • the effectiveness of remediation activities. Systems requiring reimaging ( per Trailing indicator of potential Spreadsheets period, % per period) downstream workload impact Manual tracking due to (in)security continues 16 NIST refers to this as the “vulnerability ratio.” 57 CHAPTER 3 DIAGNOSING PROBLEMS AND MEASURING TECHNICAL SECURITY Table 3-3 Coverage and Control Metrics (Continued) Metric Purpose Sources Workstation, laptop, server System integrity/survivability; Honeypot software survival (time) systems with few vulnerabilities Manual tracking should survive much longer Time to re-create fully backed-up Efficiency in restoring services to Manual tracking server from scratch (time as % a resource within business of SLA) requirements ANTIVIRUS AND ANTISPYWARE In the preceding section, I mentioned that the most common antivirus and antimalware statistics are best viewed as “fun facts.” My colleague Betsy Nichols has a different label for them: she calls them “happy metrics.” Regardless of the label, we both agree that triv- ial statistics about numbers of blocked viruses don’t say anything about the effectiveness of antivirus software in stopping unknown threats, and they don’t say anything about the program’s overall effectiveness. I do not discount their value in helping educate man- agement about the sheer size of the malware problem, or their usefulness in justifying continued purchase of antivirus software. But to understand the consistency of imple- mentation of desired antivirus controls—in other words, whether there is a gap between policy and practice—we need broader coverage and control metrics. Coverage metrics identify implementation gaps: of the eligible workstations and servers, how many have antivirus and antispyware software? And of the machines cov- ered by the software, how many have updated signature and policy files? Metrics such as “workstations covered by antivirus software” and “workstations/servers with current antispyware signatures” help administrators understand the extent of their control régime. For most of the antivirus and antispyware metrics, I have suggested using two units of measure: absolute numbers () and percentages (%) of the overall installed base. For the most part, absolute numbers matter less than percentages. In most large companies, knowing (for example) that 688 workstations run the approved antivirus software is less interesting than the fact that this number represents only 54% of the eligible hosts according to what the security policy dictates. I have not included nearly as many metrics in the “Antivirus and Antispyware” section of Table 3-3, in part because these controls are relatively easy to assess—a workstation or server either has the software on it, or it does not. In addition, I have left out many of the 58

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.