What is Big data and how does it work

what is big data and why is it important and big data benefits and challenges and what are the benefits of big data and big data benefits for individuals and companies
JuliyaMadenta Profile Pic
JuliyaMadenta,Philippines,Researcher
Published Date:15-07-2017
Your Website URL(Optional)
Comment
A Tool for Inclusion or Exclusion? U    I  FTC R � F T C J 2016Big Data A Tool for Inclusion or Exclusion? Understanding the Issues FTC Report January 2016 Federal Trade Commission Edith Ramirez, Chairwoman Julie Brill, Commissioner Maureen K. Ohlhausen, Commissioner Terrell McSweeny, CommissionerContents Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II . Life Cycle of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 III . Big Data’s Benefits and Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 IV . Considerations for Companies in Using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 A. Potentially Applicable Laws .................................................. 12 Questions for Legal Compliance .............................................24 B. Special Policy Considerations Raised by Big Data Research .....................25 Summary of Research Considerations .........................................32 V . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Appendix: Separate Statement of Commissioner Maureen K Ohlhausen . . . . . . . . . . . . . . . . . . . . . A-1Big Data: A Tool for Inclusion or Exclusion? Executive Summary We are in the era of big data. With a smartphone now in nearly every pocket, a computer in nearly every household, and an ever-increasing number of Internet-connected devices in the marketplace, the amount of consumer data flowing throughout the economy continues to increase rapidly. The analysis of this data is often valuable to companies and to consumers, as it can guide the development of new products and services, predict the preferences of individuals, help tailor services and opportunities, and guide individualized marketing. At the same time, advocates, academics, and others have raised concerns about whether certain uses of big data analytics may harm consumers, particularly low- income and underserved populations. To explore these issues, the Federal Trade Commission (“FTC” or “the Commission”) held a public workshop, Big Data: A Tool for Inclusion or Exclusion?, on September 15, 2014. The workshop brought together stakeholders to discuss both the potential of big data to create opportunities for consumers and to exclude them from such opportunities. The Commission has synthesized the information from the workshop, a prior FTC seminar on alternative scoring products, and recent research to create this report. Though “big data” encompasses a wide range of analytics, this report addresses only the commercial use of big data consisting of consumer information and focuses on the impact of big data on low-income and underserved populations. Of course, big data also raises a host of other important policy issues, such as notice, choice, and security, among others. Those, however, are not the primary focus of this report. As “little” data becomes “big” data, it goes through several phases. The life cycle of big data can be divided into four phases: (1) collection; (2) compilation and consolidation; (3) analysis; and (4) use. This report focuses on the fourth phase and discusses the benefits and risks created by the use of big data analytics; the consumer protection and equal opportunity laws that currently apply to big data; research in the field of big data; and lessons that companies should take from the research. Ultimately, this report is intended to educate businesses on important laws and research that are relevant to big data analytics and provide suggestions aimed at maximizing the benefits and minimizing its risks. Big Data’s Benefits and Risks Big data analytics can provide numerous opportunities for improvements in society. In addition to more effectively matching products and services to consumers, big data can create opportunities for low- income and underserved communities. For example, workshop participants and others have noted that big data is helping target educational, credit, healthcare, and employment opportunities to low-income and underserved populations. At the same time, workshop participants and others have noted how potential inaccuracies and biases might lead to detrimental effects for low-income and underserved populations. For example, participants raised concerns that companies could use big data to exclude low-income and underserved communities from credit and employment opportunities. iFederal Trade Commission Consumer Protection Laws Applicable to Big Data Workshop participants and commenters discussed how companies can use big data in ways that provide benefits to themselves and society, while minimizing legal and ethical risks. Specifically, they noted that companies should have an understanding of the various laws, including the Fair Credit Reporting Act, equal opportunity laws, and the Federal Trade Commission Act, that may apply to big data practices. 1. Fair Credit Reporting Act The Fair Credit Reporting Act (“FCRA”) applies to companies, known as consumer reporting agencies or CRAs, that compile and sell consumer reports, which contain consumer information that is used or expected to be used for credit, employment, insurance, housing, or other similar decisions about consumers’ eligibility for certain benefits and transactions. Among other things, CRAs must implement reasonable procedures to ensure maximum possible accuracy of consumer reports and provide consumers with access to their own information, along with the ability to correct any errors. Traditionally, CRAs include credit bureaus, employment background screening companies, and other specialty companies that provide particularized services for making consumer eligibility decisions, such as check authorizations or tenant screenings. Some data brokers may also be considered CRAs subject to the FCRA, particularly if they advertise their services for eligibility purposes. The Commission has entered into a number of consent decrees with data brokers that advertise their consumer profiles for employment and tenant screening purposes. Companies that use consumer reports also have obligations under the FCRA. Workshop panelists and commenters discussed a growing trend in big data, in which companies may be purchasing predictive analytics products for eligibility determinations. Under traditional credit scoring models, companies compare known credit characteristics of a consumer—such as past late payments—with historical data that shows how people with the same credit characteristics performed over time in meeting their credit obligations. Similarly, predictive analytics products may compare a known characteristic of a consumer to other consumers with the same characteristic to predict whether that consumer will meet his or her credit obligations. The difference is that, rather than comparing a traditional credit characteristic, such as debt payment history, these products may use non-traditional characteristics—such as a consumer’s zip code, social media usage, or shopping history—to create a report about the creditworthiness of consumers that share those non-traditional characteristics, which a company can then use to make decisions about whether that consumer is a good credit risk. The standards applied to determine the applicability of the FCRA in a Commission enforcement action, however, are the same. Only a fact-specific analysis will ultimately determine whether a practice is subject to or violates the FCRA, and as such, companies should be mindful of the law when using big data analytics to make FCRA- covered eligibility determinations. iiBig Data: A Tool for Inclusion or Exclusion? 2. Equal Opportunity Laws Companies should also consider a number of federal equal opportunity laws, including the Equal Credit Opportunity Act (“ECOA”), Title VII of the Civil Rights Act of 1964, the Americans with Disabilities Act, the Age Discrimination in Employment Act, the Fair Housing Act, and the Genetic Information Nondiscrimination Act. These laws prohibit discrimination based on protected characteristics such as race, color, sex or gender, religion, age, disability status, national origin, marital status, and genetic information. Of these laws, the FTC enforces ECOA, which prohibits credit discrimination on the basis of race, color, religion, national origin, sex, marital status, age, or because a person receives public assistance. To prove a violation of ECOA, plaintiffs typically must show “disparate treatment” or “disparate impact.” Disparate treatment occurs when a creditor treats an applicant differently based on a protected characteristic. For example, a lender cannot refuse to lend to single persons or offer less favorable terms to them than married persons even if big data analytics show that single persons are less likely to repay loans than married persons. Disparate impact occurs when a company employs facially neutral policies or practices that have a disproportionate adverse effect or impact on a protected class, unless those practices or policies further a legitimate business need that cannot reasonably be achieved by means that are less disparate in their impact. For example, if a company makes credit decisions based on consumers’ zip codes, such decisions may have a disparate impact on particular ethnic groups because certain ethnic groups are concentrated in particular zip codes. Accordingly, the practice may be a violation of ECOA. The analysis turns on whether the decisions have a disparate impact on a protected class and are not justified by a legitimate business necessity. Even if evidence shows the decisions are justified by a business necessity, if there is a less discriminatory alternative, the decisions may still violate ECOA. Workshop discussions focused on whether advertising could implicate equal opportunity laws. In most cases, a company’s advertisement to a particular community for a credit offer that is open to all to apply is unlikely, by itself, to violate ECOA, absent disparate treatment or an unjustified disparate impact in subsequent lending. Nevertheless, companies should proceed with caution in this area. For advertisements relating to credit products, companies should look to Regulation B, which is the implementing regulation for ECOA. It prohibits creditors from making oral or written statements, in advertising or otherwise, to applicants or prospective applicants that would discourage on a prohibited basis a reasonable person from making or pursuing an application. With respect to prescreened solicitations, Regulation B also requires creditors to maintain records of the solicitations and the criteria used to select potential recipients. Advertising and marketing practices could impact a creditor’s subsequent lending patterns and the terms and conditions of the credit received by borrowers, even if credit offers are open to all who apply. In some cases, the Department of Justice has cited a creditor’s advertising choices as evidence of discrimination. iiiFederal Trade Commission Ultimately, as with the FCRA, whether a practice is unlawful under equal opportunity laws is a case-specific inquiry, and as such, companies should proceed with caution when their practices could result in disparate treatment or have a demonstrable disparate impact based on protected characteristics. 3. The Federal Trade Commission Act Workshop participants and commenters also discussed the applicability of Section 5 of the Federal Trade Commission Act (“FTC Act”), which prohibits unfair or deceptive acts or practices, to big data analytics. Companies engaging in big data analytics should consider whether they are violating any material promises to consumers—whether that promise is to refrain from sharing data with third parties, to provide consumers with choices about sharing, or to safeguard consumers’ personal information—or whether they have failed to disclose material information to consumers. In addition, companies that maintain big data on consumers should take care to reasonably secure consumers’ data. Further, at a minimum, companies must not sell their big data analytics products to customers if they know or have reason to know that those customers will use the products for fraudulent or discriminatory purposes. The inquiry will be fact-specific, and in every case, the test will be whether the company is offering or using big data analytics in a deceptive or unfair way. Research on Big Data Workshop participants, academics, and others also addressed the ways big data analytics could affect low-income, underserved populations, and protected groups. Some pointed to research that demonstrates that there is a potential for incorporating errors and biases at every stage—from choosing the data set used to make predictions, to defining the problem to be addressed through big data, to making decisions based on the results of big data analysis—which could lead to potential discriminatory harms. Others noted that these concerns are overstated or simply not new, and emphasized that rather than disadvantaging minorities, big data can create opportunities for low-income and underserved populations. To maximize the benefits and limit the harms of big data, the Commission encourages companies to consider the following questions raised by research in this area: „ How representative is your data set? Companies should consider whether their data sets are missing information about certain populations, and take steps to address issues of underrepresentation and overrepresentation. For example, if a company targets services to consumers who communicate through an application or social media, they may be neglecting populations that are not as tech-savvy. „ Does your data model account for biases? Companies should consider whether biases are being incorporated at both the collection and analytics stages of big data’s life cycle, and develop strategies to overcome them. For example, if a company has a big data algorithm that only considers applicants from “top tier” colleges to help them make hiring decisions, they may be incorporating previous biases in college admission decisions. ivBig Data: A Tool for Inclusion or Exclusion? „ How accurate are your predictions based on big data? Companies should remember that while big data is very good at detecting correlations, it does not explain which correlations are meaningful. A prime example that demonstrates the limitations of big data analytics is Google Flu Trends, a machine-learning algorithm for predicting the number of flu cases based on Google search terms. While, at first, the algorithms appeared to create accurate predictions of where the flu was more prevalent, it generated highly inaccurate estimates over time. This could be because the algorithm failed to take into account certain variables. For example, the algorithm may not have taken into account that people would be more likely to search for flu-related terms if the local news ran a story on a flu outbreak, even if the outbreak occurred halfway around the world. „ Does your reliance on big data raise ethical or fairness concerns? Companies should assess the factors that go into an analytics model and balance the predictive value of the model with fairness considerations. For example, one company determined that employees who live closer to their jobs stay at these jobs longer than those who live farther away. However, another company decided to exclude this factor from its hiring algorithm because of concerns about racial discrimination, particularly since different neighborhoods can have different racial compositions. The Commission encourages companies to apply big data analytics in ways that provide benefits and opportunities to consumers, while avoiding pitfalls that may violate consumer protection or equal opportunity laws, or detract from core values of inclusion and fairness. For its part, the Commission will continue to monitor areas where big data practices could violate existing laws, including the FTC Act, the FCRA, and ECOA, and will bring enforcement actions where appropriate. The Commission will also continue to examine and raise awareness about big data practices that could have a detrimental impact on low-income and underserved populations, and promote the use of big data that has a positive impact on such populations. vFederal Trade Commission viBig Data: A Tool for Inclusion or Exclusion? I. Introduction The era of big data has arrived. While companies historically have collected and used information about their customer interactions to help improve their operations, the expanding use of online technologies has greatly increased the amount of consumer data that flows throughout the economy. In many cases, when consumers engage digitally—whether by shopping, visiting websites, paying bills, connecting with family and friends through social media, using mobile applications, or using connected devices, such as fitness trackers or smart televisions—companies collect information about their choices, experiences, and individual characteristics. The analysis of this consumer information is often valuable to companies and to consumers, as it provides insights into market-wide tastes and emerging trends, which can guide the development of new products and services. It is also valuable to predict the preferences of specific individuals, help tailor services, and guide individualized marketing of products and services. The term “big data” refers to a confluence of factors, including the nearly ubiquitous collection of consumer data from a variety of sources, the plummeting cost of data storage, and powerful new capabilities 1 to analyze data to draw connections and make inferences and predictions. A common framework for characterizing big data relies on the “three Vs,” the volume, velocity, and variety of data, each of which is growing at a rapid rate as technological advances permit the analysis and use 2 of this data in ways that were not possible previously Volume r . efers to the vast quantity of data that can be gathered and analyzed effectively. The costs of collecting and storing data continue to drop dramatically. And the ability to access millions of data points increases the predictive power of consumer data analysis. 1 See, e.g., Exec. Office of the President, Big Data: Seizing Opportunities, Preserving Values 2–3 (2014) hereinafter “White House May 2014 Report”, http://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_5.1.14_ final_print.pdf; Jim Thatcher, Living on Fumes: Digital Footprints, Data Fumes, and the Limitations of Spatial Big Data, 8 Int’l J. Of Commc’n 1765, 1767–69 (2014), http://ijoc.org/index.php/ijoc/article/view/2174/1158 . See also Comment 00018 from Persis Yu, Nat’l Consumer L. Ctr., to Fed. Trade Comm’n, attached report at 10 (Aug. 15, 2014), https://www.ftc.gov/ system/files/documents/public_comments/2014/08/00018-92374.pdf . 2 See, e.g., Transcript of Big Data: A Tool for Inclusion or Exclusion?, in Washington, D.C. (Sept. 15, 2014), at 15 (Solon Barocas), 32 (Joseph Turow), 40–41 (Joseph Turow), 261 (Christopher Wolf) hereinafter Big D https://www ata Tr., .ftc.gov/ system/files/documents/public_events/313371/bigdata-transcript-9_15_14.pdf . See also White House May 2014 Report, supra note 1, at 4–5; Comment 00067 from Mark MacCarthy, Software & Info. Indus. Assoc., to Fed. Trade Comm’n 2 (Oct. 31, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/10/00067-92918.pdf ; Comment 00065 from Jules Polonetsky & Christopher Wolf, Future of Privacy Forum, to Fed. Trade Comm’n 2 (Oct. 31, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/10/00065-92921.pdf ; Comment 00049 from Martin Abrams, Info. Accountability Found., to Fed. Trade Comm’n 3–4 & n.6, https://www.ftc.gov/system/files/documents/public_ comments/2014/10/00049-92780.pdf ; Comment 00031 from M. Gary LaFever & Ted Myerson, anonos, to Fed. Trade Comm’n 1 (Aug. 21, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/08/00031-92442.pdf . Others suggest that there is a “fourth V,” veracity, to denote the accuracy and integrity of data used. See, e.g., Brian Gentile, The New Factors of Production and the Rise of Data-Driven Applications , Forbes (Oct. 31, 2011), http://www.forbes.com/sites/ ciocentral/2011/10/31/the-new-factors-of-production-and-the-rise-of-data-driven-applications/ . 1Federal Trade Commission Velocity is the speed with which companies can accumulate, analyze, and use new data. Technological improvements allow companies to harness the predictive power of data more quickly than ever before, 3 sometimes instantaneously . Variety means the breadth of data that companies can analyze effectively. Companies can now combine very different, once unlinked, kinds of data—either on their own or through data brokers or analytics firms—to infer consumer preferences and predict consumer behavior, for example. Together, the three Vs allow for more robust research and correlation. Previously, finding a representative data sample sufficient to produce statistically significant results could be very difficult and expensive. Today, the present scope and scale of data collection enables cost-effective, substantial research of even obscure or mundane topics (e.g., the amount of foot traffic in a park at different times of day). Big data can produce tremendous benefits for society, such as advances in medicine, education, health, and transportation, and in many instances, without using consumers’ personally identifiable information. Big data also can allow companies to improve their offerings, provide consumers with personalized goods and services, and match consumers with products they are likely to be interested in. At the same time, advocates, academics, and others have raised concerns about whether certain uses of big data analytics may harm consumers. For example, if big data analytics incorrectly predicts that particular consumers are not likely to respond to prime credit offers, certain types of educational opportunities, or job openings requiring a college degree, companies may miss a chance to reach individuals that desire this information. In addition, if big data analytics incorrectly predicts that particular consumers are not good candidates for prime credit offers, educational opportunities, or certain lucrative jobs, such educational opportunities, employment, and credit may never be offered to these consumers. Some fear that such incorrect predictions could perpetuate existing disparities. To examine these issues, the Federal Trade Commission (“FTC” or “the Commission”) held a public 4 workshop, Big Data: A Tool for Inclusion or Exclusion?, on September 15, 2014. In particular, the workshop explored the potential impact of big data on low-income and underserved populations. The workshop brought together academics, government representatives, consumer advocates, industry representatives, legal practitioners, and others to discuss the potential of big data to create opportunities for consumers or exclude them from such opportunities. The workshop consisted of four panels addressing the following topics: (1) current uses of big data; (2) potential uses of big data; (3) the application of equal opportunity and consumer protection laws to big data; and (4) best practices to enhance consumer protection in the use of big data. The Commission also received sixty-five public comments on these issues from private citizens, industry representatives, trade groups, consumer and privacy advocates, think tanks, and academics. 3 White House May 2014 Report, supra note 1, at 5. 4 The materials from the workshop are available on the FTC website at http://www.ftc.gov/news-events/events- calendar/2014/09/big-data-tool-inclusion-or-exclusion . 2Big Data: A Tool for Inclusion or Exclusion? The Commission has synthesized the discussions and comments from the workshop—along with 5 the record from a prior FTC seminar on alternative scoring pr and r oducts ecent research—to create this report, which focuses on the impact of big data on low-income and underserved populations. The report is divided into four sections. First, the report describes the “life cycle” of big data and how “little” data turns into big data. Second, it discusses some of the benefits and risks created by the use of big data. Third, it describes some of the consumer protection laws that currently apply to big data. Finally, it discusses certain research in the field of big data and lessons that companies should take from the research in order to help them maximize the benefits of big data while mitigating risks. Importantly, though the term “big data” encompasses a wide range of analytics, this report addresses only the commercial use of big data consisting of 6 consumer information. II. Life Cycle of Big Data The life cycle of big data can be divided into four phases: (1) collection; (2) compilation and 7 consolidation; (3) data mining and analytics; and (4) use. As to the first step, not all data starts as big data. Rather, companies collect bits of data from a variety 8 of sources. For example, as consumers browse the web or shop online, companies can track and link their activities. Sometimes consumers log into services or identify themselves when they make a purchase. Other 5 On March 19, 2014, the Commission hosted a seminar on alternative scoring products and received nine public comments in connection with the seminar Spring P . rivacy Series: Alternative Scoring Products, Fed. Trade Comm’n (Mar. 19, 2014), http:// www.ftc.gov/news-events/events-calendar/2014/03/spring-privacy-series-alternative-scoring-pr . oducts 6 The report does include some examples from non-commercial fields, but it is intended to guide companies as they use big data about consumers. 7 See, e.g., Nat’l Consumer L. Ctr. Comment 00018, supra note 1, attached report at 11–12. In May 2014, the Commission released a report studying data brokers, which focused on the first three phases of the life cycle of big data. Fed. Trade Comm’n, Data Brokers: A Call for Transparency and Accountability (2014) hereinafter “ Data Brokers Report”, https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade- commission-may-2014/140527databrokerreport.pdf . 8 See generally Comment 00055 from Daniel Castro, Ctr. for Data Innovation, to Fed. Trade Comm’n (Oct. 23, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/10/00055-92856.pdf ; Comment 00026 from Daniel Castro, Ctr. for Data Innovation, to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/files/ documents/public_comments/2014/08/00026-92395.pdf ; Comment 00024 from Alvaro Bedoya, Ctr. on Privacy & Tech. at Geo. L., to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/files/documents/public_ comments/2014/08/00024-92434.pdf ; Nat’l Consumer L. Ctr. Comment 00018, supra note 1; Comment 00016 from James Steyer, Common Sense Media, to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/files/documents/ public_comments/2014/08/00016-92371.pdf ; Comment 00015 from Nathan Newman, N.Y.U. Info. L. Inst., to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/08/00015-92370. pdf; Comment 00010 from Thomas Lenard, Tech. Pol’y Inst., to Fed. Trade Comm’n (July 28, 2014), https://www.ftc.gov/ system/files/documents/public_comments/2014/07/00010-92280.pdf ; Comment 00003 from Jeff Chester, Ctr. for Dig. Democracy, & Edmund Mierzwinski, U.S. PIRG Educ. Fund, to Fed. Trade Comm’n (May 9, 2014), https://www.ftc.gov/ system/files/documents/public_comments/2014/05/00003-90097.pdf . 3Federal Trade Commission 9 10 11 times, techniques such as tracking cookies, browser or device fingerprinting, and even history sniffing identify who consumers are, what they do, and where they go. In the mobile environment, companies track and link consumers’ activities across applications as another method of gathering information about their habits and preferences. More broadly, cross-device tracking offers the ability to interact with the same consumer across her desktop, laptop, tablet, wearable, and smartphone, using both online and offline 12 information. Companies also are gathering data about consumers across the Internet of Things—the 13 millions of Internet-connected devices that are in the mar Finally ket. , data collection occurs offline as well, for example, through loyalty programs, warranty cards, surveys, sweepstakes entries, and even credit card 14 purchases. After collection, the next step in the life cycle of big data is compilation and consolidation. Commercial 15 entities that compile data include online ad networks, social media companies, and large banks or r etailers. One important category of commercial entities that compile and consolidate data is data brokers. They combine data from disparate sources to build profiles about individual consumers. Indeed, some data 16 brokers store billions of data elements on nearly every U.S. consumer . The third step is data analytics. One form of analytics is descriptive—the objective is to uncover and 17 summarize patterns or features that exist in data sets. By contrast, predictive data analytics refers to the use 9 Tracking cookies are a specific type of cookie that is distributed, shared, and read across two or more unrelated websites for the purpose of gathering information or presenting customized data to a consumer See Tracking . Cookie, Symantec, https:// www.symantec.com/security_response/writeup.jsp?docid=2006-080217-3524-99 (last visited Dec. 29, 2015). 10 “‘Browser fingerprinting’ is a method of tracking web browsers by the configuration and settings information they make visible to websites, rather than traditional tracking methods” such as cookies. Panopticlick: Is Your Browser Safe Against Tracking?, Elec. Frontier Found., https://panopticlick.eff.org/aboutbrowser-fingerprinting (last visited Dec. 29, 2015). 11 History sniffing is the practice of tracking which sites a user has or has not visited. See Ben Schott, History Sniffing , N.Y. Times (Dec. 8, 2010), http://schott.blogs.nytimes.com/2010/12/08/history-sniffing/?_r=0 . See also Brian Krebs, What You Should Know About History Sniffing , Krebs on Sec. (Dec. 6, 2010), http://krebsonsecurity.com/2010/12/what-you-should- know-about-history-sniffing/ . 12 In November 2015, the Commission held a workshop to study the various alternative techniques used to track consumers across their devices. See Cross-Device Tracking, Fed. Trade Comm’n (Nov. 16, 2015), https://www.ftc.gov/news-events/ events-calendar/2015/11/cross-device-tracking . 13 In January 2015, the Commission released a staff report entitled, Internet of Things: Privacy & Security in a Connected World, recommending steps businesses can take to enhance and protect consumers’ privacy and security as it relates to Internet-connected devices. Fed. Trade Comm’n, Internet of Things: Privacy and Security in a Connected World (2015), https://www.ftc.gov/system/files/documents/reports/federal-trade-commission-staff-report-november-2013- workshop-entitled-internet-things-privacy/150127iotrpt.pdf . 14 See, e.g., Data Brokers Report, supra note 7, at 11–15. 15 See generally Nat’l Consumer L. Ctr. Comment 00018, supra note 1; N.Y.U. Info. L. Inst. Comment 00015, supra note 8; Ctr. for Dig. Democracy & U.S. PIRG Educ. Fund Comment 00003, supra note 8. 16 See, e.g., Data Brokers Report, supra note 7, at 46–47. 17 See, e.g., Big Data Tr. 17 (Solon Barocas) (“We can define data mining as the automated process of extracting useful patterns from large data sets, and in particular, patterns that can serve as a basis for subsequent decision making. See also Jure ”). Leskovec et al., Mining of Massive Data Sets 1, 1 (2014), http://www.mmds.org/ (characterizing “data mining” as “the construction of a statistical model, that is, an underlying distribution from which the visible data is drawn”) (emphasis in original). 4Big Data: A Tool for Inclusion or Exclusion? 18 of statistical models to generate new data. Developing and testing the models that find patterns and make 19 predictions can require the collection and use of copious amounts of data. In a market context, a common purpose of big data analytics is to draw inferences about consumers’ likely choices. Companies may decide to adopt big data analytics to better understand consumers, potentially by using data to attribute to an individual the qualities of those who appear statistically similar, e.g., those who have made similar decisions in similar situations in the past. Thus, a retail firm might use data on its customers’ past purchases, web searches, shopping habits, and prices paid to create a statistical model of consumers’ purchases at different prices. With that model, the retailer could then compare a prospective consumer’s characteristics or past purchases, web searches, and location information to predict how likely the consumer is to purchase a product at various price points. The final step in the life cycle of big data is use. The Commission’s May 2014 report entitled Data Brokers: A Call for Transparency and Accountability focused on the first three steps in the life cycle of big data 20 within that industry—collection, compilation, and analytics. It discussed how information gathered for one purpose (e.g., paying for goods and services) could be compiled and analyzed for other purposes, such as for marketing or risk mitigation. In contrast, this report focuses on cer uses of big data. I tain t examines the question of how companies use big data to help consumers and the steps they can take to avoid inadvertently harming consumers through big data analytics. III. Big Data’s Benefits and Risks Companies have been analyzing data from their own customer interactions on a smaller scale for many 21 years, but the era of big data is still in its infancy As a r . esult, mining large data sets to find useful, non- obvious patterns is a relatively new but growing practice in marketing, fraud prevention, human resources, and a variety of other fields. Companies are still learning how to deal with big data and unlock its potential 22 while avoiding unintended or unforeseen consequences. Appropriately employing big data algorithms on data of sufficient quality can provide numerous opportunities for improvements in society. In addition to the market-wide benefits of more efficiently matching products and services to consumers, big data can create opportunities for low-income and 18 See, e.g., Galit Shmueli, To Explain or Predict?, 25 Statistical Sci. 289, 291 (2010), http://www.stat.berkeley. edu/aldous/157/Papers/shmueli.pdf . See also Mike Wu, Big Data Reduction 2: Understanding Predictive Analytics, Sci. of Social Blog (Mar. 26, 2013 9:41 AM), http://community.lithium.com/t5/Science-of-Social-blog/Big-Data-Reduction-2- Understanding-Predictive-Analytics/ba-p/79616 (“Predictive analytics is all about using data you have to predict data that you don’t have.”) (emphases in original). 19 Cf. Comment 00014 from Pam Dixon & Robert Gellman, World Privacy Forum, to Fed. Trade Comm’n 8 (Aug. 14, 2014), https://www.ftc.gov/policy/public-comments/2014/08/14/comment-00014 . 20 See generally Data Brokers Report, supra note 7. 21 See, e.g., Big Data Tr. 31–32 (Gene Gsell), 32–33 (Joseph Turow), 34 (Mallory Duncan), 107–08 (Pamela Dixon). 22 See, e.g., Big Data Tr. 31–32 (Gene Gsell), 32–33 (Joseph Turow), 78 (danah boyd), 233 (Michael Spadea). 5Federal Trade Commission 23 underserved communities. Workshop participants and others have noted that big data is already being used to: „ Increase educational attainment for individual students. Educational institutions have used big data techniques to identify students for advanced classes who would otherwise not have been 24 eligible for such classes based on teacher recommendations alone. These institutions have also used big data techniques to help identify students who are at risk of dropping out and in need of early 25 intervention strategies. Similarly, organizations have used big data analytics to demonstrate how certain disciplinary practices, such as school suspensions, affect African-American students far more than Caucasian students, thereby partly explaining the large discrepancy between the graduation 26 rates of these two groups. „ Provide access to credit using non-traditional methods. Companies have used big data to provide 27 alternative ways to score populations that were previously deemed unscorable. For example, 28 LexisNexis has created an alternative credit score called RiskV This pr ie oduct r w. elies on traditional public record information, such as foreclosures and bankruptcies, but it also includes educational history, professional licensure data, and personal property ownership data. Thus, consumers who may not have access to traditional credit, but, for instance, have a professional license, pay rent 29 on time, or own a car, may be given better access to credit than they otherwise would hav e. 23 See, e.g., Big Data Tr. 83–85 (Mark MacCarthy), 250–51 (Christopher Wolf See gener ). ally Comment 00076 from William Kovacs, U.S. Chamber of Commerce, to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/files/documents/ public_comments/2014/10/00076-92936.pdf ; Comment 00073 from Michael Beckerman, The Internet Assoc., to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/10/00073-92923.pdf ; Comment 00066 from Carl Szabo, NetChoice, to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/files/ documents/public_comments/2014/10/00066-92920.pdf ; Comment 00063 from Peggy Hudson, Direct Mktg. Assoc., to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/files/documents/public_comments/2014/10/00063-92909. pdf; Ctr. for Data Innovation Comment 00055, supra note 8; Comment 00027 from Jules Polonetsky, Future of Privacy Forum, to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/files/documents/public_ comments/2014/08/00027-92420.pdf ; Ctr. for Data Innovation Comment 00026, supra note 8; Comment 00017 from Mike Zaneis, Interactive Advert. Bureau, to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/files/ documents/public_comments/2014/08/00017-92372.pdf ; Tech. Pol’y Inst. Comment 00010, supra note 8. 24 See, e.g., Big Data Tr. 47–48 (Gene Gsell). Cf. Ctr. for Data Innovation Comment 00055, supra note 8, attached report entitled, The Rise of Data Poverty in America, at 4–6. 25 See, e.g., Big Data Tr. 84–85 (Mark MacCarthy). See also Software & Info. Indus. Assoc. Comment 00067, supra note 2, at 6–7; Ctr. for Data Innovation Comment 00026, supra note 8, at 2. 26 See, e.g., Big Data Tr. 250 (Christopher Wolf See also ). Future of Privacy Forum Comment 00027, supra note 23, attached report entitled, Big Data: A Tool for Fighting Discrimination and Empowering Groups, at 9. 27 See, e.g., Big Data Tr. 49–51 (Gene Gsell), 83–84 (Mark MacCarthy), 102–06 (Stuart Pratt), 231–32 (Michael SS padea). ee also Software & Info. Indus. Assoc. Comment 00067, supra note 2, at 5–6; Tech. Pol’y Inst. Comment 00010, supra note 8, at 5–6 & attached report entitled, Big Data, Privacy and the Familiar Solutions, at 7. 28 See, e.g., Software & Info. Indus. Assoc. Comment 00067, supra note 2, at 5–6. 29 See, e.g., Rent Reporting for Credit Building Consulting, Credit Builders All., http://creditbuildersalliance.org/r -repor entting- credit-building-consulting (last visited Dec. 22, 2015). 6Big Data: A Tool for Inclusion or Exclusion? Furthermore, big data algorithms could help reveal underlying disparities in traditional credit 30 markets and help companies serve creditworthy consumers from any background. „ Provide healthcare tailored to individual patients’ characteristics. Organizations have used big data to predict life expectancy, genetic predisposition to disease, likelihood of hospital readmission, and likelihood of adherence to a treatment plan in order to tailor medical treatment to an 31 individual’s characteristics. This, in turn, has helped healthcare providers avoid one-size-fits-all 32 treatments and lower overall healthcare costs by reducing readmissions. Ultimately, data sets with richer and more complete data should allow medical practitioners more effectively to perform “precision medicine,” an approach for disease treatment and prevention that considers individual 33 variability in genes, environment, and lifestyle. „ Provide specialized healthcare to underserved communities. IBM, for example, has worked with hospitals to develop an Oncology Diagnosis and Treatment Advisor. This system synthesizes vast amounts of data from textbooks, guidelines, journal articles, and clinical trials to help physicians make diagnoses and identify treatment options for cancer patients. In rural and low-income areas, where there is a shortage of specialty providers, IBM’s Oncology Diagnosis and Treatment Advisor 34 can provide underserved communities with better access to cancer care and lower costs. „ Increase equal access to employment. Companies have used big data to help promote a more 35 diverse workforce. Google, for example, recognized that its traditional hiring process was resulting in a homogenous work force. Through analytics, Google identified issues with its hiring process, which included an emphasis on academic grade point averages and “brainteaser” questions 30 See, e.g., Ctr. for Data Innovation Comment 00055, supra note 8, attached report entitled, The Rise of Data Poverty in America, at 9. See generally Fair Isaac Corp., Can Alternative Data Expand Credit Access, Insights White Paper No. 90 (2015), http://www.fico.com/en/latest-thinking/white-papers/can-alternative-data-expand-cr (finding that edit-access alternative scoring can help lenders safely and responsibly extend credit to many of the more than fifty million U.S. adults who do not currently have FICO scores). 31 See, e.g., Ctr. for Data Innovation Comment 00026, supra note 8, at 2. See also Shannon Pettypiece & Jordan Robertson, Hospitals are Mining Patients’ Credit Card Data to Predict Who Will Get Sick, Bloomberg (July 3, 2014), http://www. bloomberg.com/bw/articles/2014-07-03/hospitals-are-mining-patients-credit-card-data-to-predict-who-will-get-sick . 32 See, e.g., Ctr. for Data Innovation Comment 00055, supra note 8, attached report entitled, The Rise of Data Poverty in America, at 6–8; Future of Privacy Forum Comment 00027, supra note 23, attached report entitled, Big Data: A Tool for Fighting Discrimination and Empowering Groups, at 4; Ctr. for Data Innovation Comment 00026, supra note 8, at 2. Cf. Software & Info. Indus. Assoc. Comment 00067, supra note 2, at 4–5. 33 See, e.g., David Shaywitz, New Diabetes Study Shows How Big Data Might Drive Precision Medicine, Forbes (Oct. 30, 2015), http://www.forbes.com/sites/davidshaywitz/2015/10/30/new-diabetes-study-shows-how-big-data-might-drive-precision- medicine/ . 34 See, e.g., Big Data Tr. 84 (Mark MacCarthy). See also Software & Info. Indus. Assoc. Comment 00067, supra note 2, at 4. 35 See, e.g., Big Data Tr. 126 (Mark MacCarthy), 251 (Christopher Wolf); Software & Info. Indus. Assoc. Comment 00067, supra note 2, at 7; Future of Privacy Forum Comment 00027, supra note 23, attached report entitled, Big Data: A Tool for Fighting Discrimination and Empowering Groups, at 1–2. See also Lauren WeberC , an This Algorithm Find Hires of a Certain Race?, Wall St. J. (Apr. 30, 2014), http://blogs.wsj.com/atwork/2014/04/30/can-this-algorithm-find-hires-of-a- certain-race/ . 7Federal Trade Commission during interviews. Google then modified its interview practices and began asking more structured 36 behavioral questions (e.g., how would you handle the following situation?). This new approach helped ensure that potential interviewer biases had less effect on hiring decisions. While recognizing these potential benefits, some researchers and others have expressed concern that the use of big data analytics to make predictions may exclude certain populations from the benefits society and markets have to offer. This concern takes several forms. First, some workshop participants and commenters 37 expressed concerns about the quality of data, including its accuracy, completeness, and representativ eness. 38 Another concern is that there are uncorrected biases in the underlying consumer data. For example, one academic has argued that hidden biases in the collection, analysis, and interpretation stages present 39 considerable risks. If the process that generated the underlying data reflects biases in favor of or against certain types of individuals, then some statistical relationships revealed by that data could perpetuate those biases. When not recognized and addressed, poor data quality can lead to inaccurate predictions, which in turn can lead to companies erroneously denying consumers offers or benefits. Although the use of inaccurate 40 or biased data and analysis to justify decisions that have harmed certain populations is not ne some w, commenters worry that big data analytics may lead to wider propagation of the problem and make it more 41 difficult for the company using such data to identify the source of discriminatory effects and addr ess it. 36 See, e.g., Big Data Tr. 251 (Christopher Wolf See also ). Future of Privacy Forum Comment 00027, supra note 23, attached report entitled, Big Data: A Tool for Fighting Discrimination and Empowering Groups, at 2; David Amerland, 3 Ways Big Data Changed Google’s Hiring Process, Forbes (Jan. 21, 2014), http://www.forbes.com/sites/netapp/2014/01/21/big-data- google-hiring-process/ ; Adam Bryant, In Head-Hunting, Big Data May Not Be Such a Big Deal, N.Y. Times (June 19, 2013), http://www.nytimes.com/2013/06/20/business/in-head-hunting-big-data-may-not-be-such-a-big-deal.html?pagewanted=1& %2359&adxnnlx=1371813584-7rFFVvpSQsf/NlnpuVABGQ&%2359;_r=3 . 37 See, e.g., Big Data Tr. 21–22 (Solon Barocas), 29–31 (David Robinson), 100–02 (Dr. Nicol Turner-Lee); Transcript of Spring Privacy Series: Alternative Scoring Products, in Washington, D.C. (Mar. 19, 2014), at 44–45 (Pamela Dixon) hereinafter Alternative Scoring Thttps://www r., .ftc.gov/system/files/documents/public_events/182261/alternative-scoring-products_ final-transcript.pdf . See also Ctr. for Data Innovation Comment 00055, supra note 8, attached report entitled, The Rise of Data Poverty in America, at 2; Nat’l Consumer L. Ctr. Comment 00018, supra note 1, attached report entitled, Big Data: A Big Disappointment for Scoring Consumer Risk, at 9, 27; Ctr. for Dig. Democracy & U.S. PIRG Educ. Fund Comment 00003, supra note 8, at 2. See generally Nir Grinberg et al., Extracting Diurnal Patterns of Real World Activity from Social Media (The 9th Int’l Conference on Web and Social Media, Working Paper 2013), http://sm.rutgers.edu/pubs/ Grinberg-SMPatterns-ICWSM2013.pdf . 38 See, e.g., Big Data Tr. 23–25 (Solon Barocas); Alternative Scoring Tr. 93 (Claudia PS erlich). ee also Cynthia Dwork & Deirdre Mulligan, It’s Not Privacy and It’s Not Fair, 66 Stan. L. Rev. Online 35, 36–37 (2013), http://www.stanfordlawreview.org/ sites/default/files/online/topics/DworkMullliganSLR.pdf . 39 Kate Crawford, The Hidden Biases in Big Data , Harv. Bus. Rev. (2013), https://hbr.org/2013/04/the-hidden-biases-in-big- data . 40 See generally Helen F. Ladd, Evidence on Discrimination in Mortgage Lending, 12(2) J. of Econ. Perspectives 41 (1998), https://www.aeaweb.org/atypon.php?return_to=/doi/pdfplus/10.1257/jep.12.2.41 . 41 See, e.g., Big Data Tr. 40–41 (Joseph Turow). 8

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.