How to learn Big Data Analytics

how big data analytics works and big data analytics for beginners and how big data analytics is implemented and how to sell big data analytics
OliviaCutts Profile Pic
OliviaCutts,France,Teacher
Published Date:01-08-2017
Your Website URL(Optional)
Comment
Business Analytics the way we see it Big Data Next-Generation AnalyticsIntroduction: What is big data and why is it different? Big data has emerged as a key topic Data volumes are higher than a ƒ for CEOs and CIOs as a result of new given organization is accustomed to business drivers and opportunities processing. that make it necessary to use data as Data volumes are larger than can be ƒ a corporate asset in order to become handled by traditional database more competitive and to create real technology. business value. External data that is brought into ƒ the business from third-party or There is no unanimity as to what big public sources. data actually is. The classic definition Some of the data may come from ƒ is in terms of Volume (the sheer social media. size of the data), Velocity (the speed A significant amount of data may be ƒ with which the data is collected and highly unstructured (e.g. voice or needs to be processed), and Variety video). (the different formats of data) – but Various data sets of different types ƒ there are other definitions. Most are integrated together for analysis. discussions suggest that some or all Real-time or near real-time analysis ƒ of the following features are found in is sometimes required. Data as an asset big data: Natural language processing Customer behaviour Business Value Data as an asset Social media Mobile data Unstructured IntegrationMultiple types Customer behaviour Natural language processing Outside-in Unstructured Multiple types Big data Higher volumes Business Value Customer behaviour Multiple types Predictive analytics Mobile data Business Value Unstructured Big data Data as an asset External data Business Value Distributed processing Unstructured Unstructured Mobile data Social media External data The whole picture Integration In-memory databases Multiple types Business Value Big data Real-time analysis Social media Outside-in Multiple types External data Customer behaviour The whole picture Social media Business Value Data as an asset Outside-in Unstructured Business Value Natural language processing Multiple types Social media Customer behaviour Figure 1: Big data incorporates many different features 3 3Business Analytics the way we see it However, in our view, none of these area comes lower down the scale: features is necessary or sufficient to many large corporations are already define big data. We find it more useful managing data warehouse applications to view the concept in terms of three with 100s of terabytes of data, but elements: for smaller organizations the same volumes would be seen as “big”. ƒ The data itself. ƒ The process for dealing with the The lack of a clear threshold in terms data. of volume means that we are drawn ƒ The holistic view that it can enable. to a definition in terms of technology: “big data” could refer to instances While identifying characteristic where new distributed architectures tendencies of big data in each of (e.g. Hadoop) are needed to achieve these areas, we’ll argue that the data affordable storage of larger volumes of and process aspects are, in fact, not data. However, traditional databases not so different from what has gone are evolving quickly and coming before; the real novelty of big data lies down in cost, and so even this in the opportunity it provides to see definition is marginal. So we need to aspects of the organization’s business look to other factors for a definition. in new and more holistic ways, and importantly as a source of business More characteristic of big data is the value. importance of external data from beyond the firewall. Organizations are The data moving from an inside-out Recent years have seen exponential orientation – where they analyze growth of data in businesses and and report on data from within society at large. Contributors to this the organization – to an outside- growth, which is enabled by low-cost in one, where data from outside storage, include: the enterprise is brought inside to provide new value. This is the really ƒ Online business, mobile computing, game-changing feature of the big and social media. data concept: data from outside the ƒ Adoption of digital sensors and organization, or from its edge, joining RFID tags, connected to ever larger with data from inside to provide a sensor networks, together with new, more holistic, view. IP-addressable objects leading to the “internet of things”. Insights derived from big data ƒ Digitization of voice and typically inform the front office’s multimedia. interactions with the customer. That ƒ Automation of processes like client means an investment in big data is interactions. likely to produce more worthwhile returns than a comparable investment Large data volumes mean different in back-office processes, which things to different organizations. At have been undergoing improvement the very large end, where Google for years. But big data still offers and eBay are dealing with petabytes significant benefits in the back office of data, these activities clearly fall too. into the “big data” category. The grey 4The process to bring data together regardless of The big data concept is also associated source, format, or type. with a process or approach. Clearly, it Exploiting big data entails dealing is not enough to collect huge volumes with: of data. Big data initiatives have to identify the right data, organize it ƒ A multiplicity of sources – into a form that can be explored everything from internal ERP with analytics, and then use those systems and document management analytics to derive insights – to allow systems, to external sources like the the business to “thrive on data”, internet and business partner as we have called it elsewhere (see systems, or those of third-party Capgemini’s Technovision: Thriving suppliers of market or demographic 2 on Data) . data. ƒ Both structured and unstructured Big data builds on existing formats – structured data from technologies like data warehousing, database tables, semi-structured business intelligence, data mining, data in forms and XML files, and enterprise content management, and unstructured free text, voice, and search (although it combines them in video data. new ways), and extends or replaces ƒ A range of customer data types – them. In many cases this is about from “exact” client multi-channel data streaming and real-time analysis. interaction and sales data, to volatile These aspects are covered in more online behavior data and social depth in later sections. media data. The holistic view that big data ƒ A range of technical data types – can enable from formally approved and Increasing volumes and complexity managed product design and of data have led to the emergence engineering data to “fuzzy” sensor of new technologies and processes. data. However, big data’s real differentiator Combining and analyzing all this data is its ability to provide radically new, provides a richness and depth of and far more complete, views of many understanding never achievable aspects of the business, including before. Consequently, more and more customers, processes, supply chains, organizations are viewing their huge and products. Organizations are data collections as a primary source of achieving previously unattainable business value. levels of insight by bringing in data from sources that either did not This is a realistic view because for exist before or were seen as out of the first time, we now have both the reach because they were external business drivers and technological to the business – and, crucially, capabilities to turn data from a by combining that new data with by-product of automation into a first- data from within the enterprise. class corporate asset for a competitive Organizations are starting to view business. all this data as an integrated whole, recognizing that to get “the whole 2 picture” of their business they need www.capgemini.com/services-and-solutions/technology/ technovision/clusters/thriving-on-data 5 5Business Analytics the way we see it Big data brings new challenges Analyzing more customer data, social The truly distinctive feature of media, and web transactions also big data, then, is the business’s means that organizations need to be expectation that it can treat large conscious of privacy and security volumes of diverse data as an requirements, and of the significant integrated whole, and a source of variations between countries. value. This expectation creates a Overenthusiastic use of personal data number of new organizational, as well can land organizations in court – not as technological, challenges. to mention the impact on corporate image. Organizational challenges mostly relate to the integration of data. In 1 Capgemini’s recent survey , business users were asked to identify their biggest impediment to using big data for effective decision-making. Most (54%) cited the existence of “silos” preventing data from being pooled for the benefit of the entire organization. These barriers need to be broken down, and data shared as the corporate asset it is. The technological challenges are well documented. There is a need for technology that can integrate and handle data regardless of source, format, and type, along with the ability to achieve real-time or near real-time processing. A further requirement is for analytics to transform huge volumes of data into relevant information and practical insights. The more data gets shared, the more its quality depends on mature data management, governance, and stewardship. An effective master data management strategy is critical to the data integration challenge. As organizations increasingly automate decisions based on big data analytics, data quality will grow ever more critical. A simple master data error could result in the wrong response to a customer, with all the damage to the relationship which that entails. 1 The Deciding Factor: Big Data & Decision Making”, June 2012. 6The business opportunity Big data provides opportunities across you that might seem frightening. the spectrum of business activity, but But today’s new consumers are the prime benefits can be grouped embracing this type of experience. under three headings: They understand that it will mean a much better level of personal service, 1. Improving interaction with the where they are presented only with ecosystem, particularly with offers and services that are relevant customers. to them, often at preferential prices. 2. Improving business processes. An example which is emerging in 3. Risk mitigation. the smart energy solutions market is a home energy management We’ll look at each of these in application. This could record power turn before contrasting strategic usage for home devices, identify opportunities with action-oriented energy-inefficient equipment (e.g. ones. a freezer), and share the data with partner device manufacturers who Improving partner interaction then prompt the customer with the with the ecosystem, particularly cost savings case for buying a more with customers efficient replacement. Many of the sources of big data are external to the enterprise (the Improving business processes outside-in view mentioned above) Many of the benefits here overlap and generated by business partners with improving partner interaction, (mainly customers, but also vendors as they aim to improve customer or and third parties) via social media, vendor business processes. However, web transactions, and goods improving business processes is a movements. These are allowing a broader category. much more intimate relationship – a better understanding of behavior – be Improving a business process depends it how they consume electricity or on having the data to understand it their preferences on pretzels. in more detail. The data might be smart grid information, telemetry For example a mobile phone operator from aircraft or other transport might group individual account systems, or readings from electronic holders into households, so that it devices (such as medical equipment can target offers based on knowledge or factory machines). Combined with of the whole household’s product analytical tools, these types of data are mix (such as whether the household enabling better prediction of future has a broadband connection). This activity and performance and allowing will enable step changes in targeted organizations to adjust processes to marketing, tailored service offerings, achieve the best outcomes. and customer retention. For example, monitoring plant and This will bring a whole new level equipment lets you know in advance of customer experience, where your when parts are likely to fail, and take service or product provider will preventive action. RFID can tell you treat you as an individual and will how a product is running through have a level of knowledge about the supply chain and enable you 7Business Analytics the way we see it Your competitors are already to optimize the flow dynamically. seizing the opportunity In healthcare, providers are using In our recent survey of over 600 information about patient experience business leaders, 57% agreed or and outcomes to inform product strongly agreed that most of their development decisions. competitors are already using big data to their strategic advantage. Many Risk mitigation were already seeing significant benefit Big data is enabling organizations themselves from their own use of big to understand and quantify risk. data, and expected this trend to Understanding performance of accelerate in the next three years. components in a large manufacturing or processing plant will alert operators An attitude of “if it ain’t broke, don’t to risks early on, allowing corrective fix it” will not work when your action to be taken. competitors are being more efficient in their marketing because they are better Big data also provides external risk targeted, or when they are improving mitigation – for instance sentiment their call response by optimizing their analysis allows organizations to find mobile workforce based on better out what is being said about them analysis. Those who do not take in discussion forums in order to advantage of this new source of pre-empt problems. In the financial business insight will be left behind. sector, risk is high on the list and big data analysis is becoming a key part of To realize the opportunities, big data managing risk compliance. has to be used in ways that release its value. Later sections of this report Strategy versus action discuss how best to do that. First, Some opportunities are longer-term, however, we review the technologies helping to define a more informed and techniques that are available to strategy or forecast, while others are help organizations deal with big data. action-oriented, for example automated up-sell based on customer profiles in web and call center interactions. Big data plays in both areas. Often it is the longer-term strategy element that determines what immediate actions are feasible. For example, analysis of buying patterns will provide a framework for determining which cross-sell opportunities are likely to be effective in an individual interaction. The intended use determines the timescale in which data is required, as discussed in our section on business analytics. 8Traditional information management techniques remain essential We have seen that, as well as goal, it is not surprising to find that conventional structured data, big data traditional information techniques and solutions often need to deal with a disciplines are often called for, either variety of unstructured data such as in conjunction with or instead of the social media content, documents, and newer ones. Furthermore, many of the streaming audio and video. There may vendors of traditional technologies are also be instrument data from devices also scaling up their solutions to deal like RFID, sensors or smart meters, with these new challenges. logs from databases and firewalls, and more. Content is dynamic and To see how these traditional solutions may need real-time or near real-time address big data requirements, let’s processing. consider the four major steps required to implement a big data solution: The next section discusses some of acquisition, marshaling, analysis, and the new and emerging technologies action. For each step, we will indicate designed to deal specifically with which traditional techniques are these big data challenges. As we useful and we will point out where have noted, however, big data is traditional techniques complement not about technology, either new or newer technologies; this mainly old: it is about data management, applies to data marshaling and to a leading to business value. As business lesser extent analysis. value from data is a traditional Acquisition Marshaling Analysis Action Collection of data Sorting (and storing) Finiding insights / Using insights to change from sources of data predictive modelling business outcomes Outputs are: Traditional ETL but often Large volumes/constant Forward (prediction) rather Human (eg. reports and real-time ‘constant feed. than historic. analysis that people then acquisition’ due to volume act on). Need to consider how it will and velocity. be consumed (real-time, Modelling behaviour: how Machine (more common As data is often external, ASAP, history) and lter will customers react? with big data) - for example, there are issues of security appropriately. automatic assessment of and trust. customer to adjust offer (eg, Format- structured, semi- Probabilistic rather than Amazon proposed products Licence for data, privacy structured or unstructured. denitive. often t with customer issues for external data. Modelling (from raw form to needs). Open data (publicly highly structured depending BPM technology/real-time available source like on source and use). decisioning http://data.gov.uk) Data lifecycle (transient vs long-term storage/archival). Master data management and governance Figure 2: Big data process model 9Business Analytics the way we see it Classic enterprise application Step 1: Data acquisition integration (EAI) technologies such This includes two elements: as the Enterprise Service Bus (ESB) Extraction/transfer and integration: are also frequently used as a complementary solution to integrate ƒ Extraction/transfer. messaging data flows. Large volumes of data must be obtained from a range of sources, Complex Event Processing (CEP) including external and mobile ones. tools sit somewhere between ETL Source format and content often and ESB, providing more change – for example, when smart transformation and rule capabilities meter or network antenna software than ESB but less ability than ETL to is upgraded, or when the hardware manage high-volume database is physically replaced. integration. They often have the ability to keep some historical data Extracting and transferring data is in memory and compare it with therefore likely to require a incoming data to detect exceptions capability to manage complex and produce alerts. These are classic projects and multiple teams, technologies but are now being used constraints and technologies, always in a new mode. with an eye on budgets. Integration is also the right time to There are other non-technical points compute metadata – for example, to address: legal licenses for data the metadata needed to extract what storage, for example, and privacy has been said from a recorded issues in the case of external data. sound, or to take real text content These issues become particularly from a web page. This metadata can important when you are accessing then be stored once and for all, to sensitive private data like geographic save computing it again for every localization, or content from social new analysis. networks, for instance. Publicly available data, and data from Both aspects of data acquisition can suppliers or partners, raise be handled by classic tools for complexities of their own. extraction/transfer and integration. ƒ Integration. Step 2: Data marshaling Extract, transform and load (ETL) Not all big data has the same tools can integrate non-structured as destination. Most data will go to a well as structured data. To achieve single destination, some may go to continuous data acquisition, they more than one destination, and some can work in a quasi-continuous nowhere at all. The choices depend on mode, being configured to spool intended use, i.e. on whether the data data every five minutes, or every is wanted for historical, real-time, or time 10GB of data is waiting to be “as soon as possible” analysis, as the loaded, for instance. record of truth, or as a source of signal data. 10Much data storage will be in Generally, even if new big data readily understand only certain traditional architectures: technologies are used, it is likely to be types of content, like figures in conjunction with traditional reported against key performance Structured data storage on very large architectures. Whatever combination indicators, documents such as PDFs ƒ databases of the type associated of old and new is chosen, the choice or spreadsheets, and outputs from with classic business intelligence of storage architecture should be search engines. Providing big data (BI) data warehouses. made with the entire lifecycle in analytics, without further Large enterprise content mind. interpretation, to humans is ƒ management (ECM) solutions – big therefore of limited use. data doesn’t mean ECM solutions do Step 3: Analysis A computer – which is more likely to ƒ not work any more. When it comes to the quest to turn be the case with big data. For Dedicated big data solutions like data into insights, once again classic example, a site like Amazon’s may ƒ Apache Hadoop complement these tools will exist alongside, and automatically assess a customer’s traditional architectures by sometimes in place of, specialist ones. characteristics in order to propose providing a low-cost option for SQL, improved SQL, extraction to an offer that fits with that customer’s storing large volumes of more or less SAS, and SPSS are all options here. needs. In this case, the content free-form data (in fact, there does needed to make the right decisions not have to be a data model at all). Traditional BI is often talked about as is much larger and more complex Search engine indexes. Some a “rear-view mirror” on the business: than for humans. It is likely to ƒ content will directly feed the indexes it tells you where you have been. With include predictive models, full list of to speed up future searches big data there is an emphasis on products and probabilities per type (metadata generated during forward analytics (i.e. prediction) of customer, and so on. integration is often used to feed rather than historic ones. Modeling A mixture of human and computer. ƒ these indexes efficiently). emphasizes behavior – for example, For example, a contact center agent Archiving solutions may be used, for predicting how customers will react to may use intelligence about spending ƒ instance, to guarantee (for legal a change. Predictions tend to be patterns to decide whether to purposes) that stored data has not probabilistic rather than definitive, advance credit. In this case, been altered since it was stored, or e.g. trying to work out the optimum statistical models are applied to give to store at low cost old data from time to replace parts. to the agent some further databases that is not used any more interpretation through pre- but still needs to be kept. Sometimes these requirements are computed solutions, but the final Transient data. Not all data needs to best tackled using the new generation choice is left to the agent. ƒ be held. It may just pass through, of big data-oriented tools – for being acted on in the moment or example, as will be discussed in the The action step is where big data, leaving some summary information. next section, MapReduce and R can with its need for immediate action, Garbage. Not all data has to be be used along with Hadoop. But once diverges most from traditional ƒ stored – and big data tends to again, these are very likely to be approaches, where action tends to be contain a high proportion of found alongside the classic tools. slow and “cold”. Nonetheless, the garbage. disciplines are once again those of Step 4: Action traditional IT. Business process Another relatively recent addition is To get value from big data, analysis management (BPM) techniques and specialist in-memory databases, must lead to fast action – the action real-time decisioning, for example, which can be used as in the CEP has to be an integral part of the can be used to integrate big data example discussed above, or to process. Actions can be carried out by results into an existing process, as launch a fast analytics procedure of three types of agent: required by the credit scoring the type needed for fraud analysis. A human acting on a report or example above. ƒ analysis. However, a human can 11 12Hadoop: A new technology for big data All the major software vendors have There are many players competing solutions out in the market for big in this space, but we will illustrate data. These include extensions to data with reference to the Apache Hadoop warehousing and content management framework. This is because most big technologies, in-memory solutions, vendors have invested in Hadoop, and parallel processing and advanced there are signs that it is becoming a de analytics. Often, one or more of these facto standard: solutions will be the right answer for a client’s big data needs. 1. Storage and data management. Before big data can be used for However, this paper does not aim to business purposes, it must be stored compare and contrast these solutions. and managed efficiently. Distributed Instead, we will talk about one new file systems, elastic storage systems, technology – Hadoop – because it and distributed and massively parallel represents a departure from previous processing databases enable storage of technologies, and brings with it new much larger volumes at low cost. challenges and opportunities. Open source frameworks like Apache Hadoop is designed to deal with very Hadoop have had a major impact large datasets by distributing the data on the storage of high-volume and over many servers. To understand complex data in a distributed manner this technology it is useful to consider across large numbers of servers. it in relation to the four major steps identified earlier – acquisition, The Hadoop File System (HDFS) is marshaling, analysis, and action. the heart of the Hadoop Framework Hadoop and its related technologies and has two major components: really only address the marshaling NameNode and DataNode. and analysis steps. Within these steps, NameNode manages the file system they perform the following functions: metadata, while DataNode stores the data. The entire Hadoop framework 1. Marshaling is built using Java and hence all the ƒ Storage and data management major components of the Hadoop technologies to deal with terabytes, framework can interact with each even exabytes, of complex data in a other using Java. wide range of forms. ƒ Processing this high volume of Other important storage mechanisms complex data in a distributed in the Hadoop framework include: manner. ƒ Administration of the big data ƒ HBase. An open source, distributed, environment. versioned, column-oriented store that is in a true sense a NoSQL 2. Analysis database. HBase provides BigTable- ƒ Data mining and predictive analysis like capabilities on top of Hadoop. to identify patterns, find the insight, ƒ Hive. Structures data into well- and obtain value from the data. understood database concepts like Each of these functions is discussed tables, columns, rows, and below. partitions. 13Business Analytics the way we see it MapReduce and HDFS instances on ƒ Further (non-Hadoop) NoSQL categorized into four main use cases: a shared cluster of nodes. databases. These include Cassandra recommendation mining; clustering; Zookeeper. Cluster management, ƒ (multi-column store); CouchDB and classification; and frequent itemset load balancing, etc. MongoDB (document databases); mining. Chukwa. Data collection system for ƒ Redis, Riak, and Membase (Key ƒ R. Although a standard analytical monitoring distributed systems. Value (KV) stores); and Neo4j (graph language used against relational database). databases, this is also the language It is worth noting that administration used on Hadoop environments, as it is the least developed of the new 2. Processing and data integration. can generate MapReduce code. technology categories, and it is Before meaningful analysis can be an important part because large performed, there has to be a robust New and old must co-exist distributed networks need a lot of mechanism to churn and crunch a We have shown that in implementing administration. This is a fast-evolving high volume of data, and integrate big data solutions, it is necessary part of the market. a variety of types of data, all in an to combine traditional tools and acceptable time frame. Important techniques with new technologies 4. Data mining and analysis. elements here include: designed especially with big data Thanks to the new capabilities for requirements in mind. processing and integrating data, ƒ MapReduce. A programming model organizations are now in a position to for efficient distributed processing, It makes no sense to retrieve large mine all the available data, instead of designed to perform computation amounts of data without being able just a sample data set as would have reliably on large volumes of data to manage it and link it to business been the case in the past. The result spread across the HDFS, in parallel. meaning. Master data management is more accurate analysis and better ƒ Pig. A high-level data processing (MDM) is not an option for big predictions. language which analyses datasets data – it is part of it. We therefore with Hive. Pig is an abstraction layer turn next to the management Although traditional predictive on top of MapReduce. strategies that companies need to analytical tools used against relational ƒ Hive SQL. An interface to access the adopt in order to derive business databases still play an important Hive structure. advantage from big data. role when the data is stored in a ƒ JAQL. A scripting language for distributed landscape such as Hadoop, large-scale, semi-structured data some additional tools are needed for analysis. this environment. These include: ƒ Cascading. An API which makes it MapReduce. This has data mining ƒ easier to perform complex and machine learning algorithms operations such as grouping and which can help to drill into huge aggregation. volumes of data, and help to mine the data using previously developed 3. Administration. models. MapReduce is always Because big data solutions usually necessary to access the data in the involve massively parallel and Hadoop infrastructure and distributed processing, a dedicated transform it into a “big grid” effort is needed to manage the entire computing solution to enable environment from a performance, analytical tools. optimization, and load balancing Apache Mahout. A Java library of ƒ perspective. Relevant tools include: machine learning and data mining algorithms, many (but not all) of ƒ Hadoop On Demand (HOD). A which are designed to run on system for provisioning and Hadoop. The algorithms are managing independent Hadoop 14Mastering big data To get value from big data, it’s of data such as transactions or social necessary to access and use large media interactions forming the really and disparate information sets to big part of big data. gain insights into markets and opportunities. That means, above all, Get the core right and you ensure being able to link different types of consistency and correct interpretation data, internal and external, together. (so that you know you are dealing with one customer and not five in the Before you can do that, you have to example above). With tight control of “master” the data – and that requires the core, analytics work better, and not only the right techniques but also more value can be derived from big the right organizational structure to be data. in place. This can be regarded as an additional aspect of “marshaling” data. In thinking about how to structure the core data, it is useful to think Importance of data quality in terms of POLE: an acronym that The garbage in, garbage out (GIGO) stands for Parties, Objects, Locations rule applies to any system. The and Events. The Parties, Objects and difference with big data is that the Locations are core data; the Events consequences of getting the inputs are the mass of transactions and wrong can be catastrophic. The larger interactions. If you are analyzing the data volumes, the greater the consumer behavior, a party might impact of poor quality becomes – and be a customer or a retailer, an object the smaller the opportunity to correct might be a product, a location might problems by hand. be a customer’s home, a retail outlet or a social media channel. The consequences of poor inputs are particularly serious when trying to Correctly identifying the P, O and L make sense of customer data. If you of the POLE gives a clear structure fail to identify a customer correctly for the core entities, which can in different transactions, you could in turn be used to structure event form the mistaken impression that information. This type of framework you are looking at five customers can accommodate new dimensions instead of one customer buying five and information sources that may be products. That can impact not only required in future without massive the company’s internal analysis but re-engineering of the solution. also the customer’s experience. Structuring big data: the POLE framework The secret of mastering big data is to structure it correctly, which above all means correctly structuring its “core” entities. The core of the data consists of the common reference points – people, organizations, accounts and so on – that link together the mass 15Business Analytics the way we see it Loading Events into the POL Data governance structure After defining the framework and implementing it on your chosen Before defining the structure of big data, two types of governance need to be in place: technology platform, it becomes 1. Governance of standards which define possible to load the Events – the 2. Governance of policies which define the structural format and definitions of how the standards will be enforced: transactions. Each event should get information: linked to the POL elements in such n Is there a central cleansing team? a way that information is consistent n When are two objects, locations or n At which point is a relationship across channels and interpreted parties considered equivalent? considered valid? consistently in all parts of the n What core information is required to n What will be done with ‘possible’ organization. identify these elements? matches? n What rules govern the relationships n Who gets to decide when a standard External data may need significant between them? needs to change? manipulation to ensure it too is consistent, which given that data Governance of this type should not apply only to big data, or to a single information area, but to all the common elements that occur across the enterprise, and in its volumes are also very large, implies external interactions. a need for significant processing power. This is particularly true of unstructured data which can be value in mastering or controlling a linked to POL to provide the first particular piece of information, don’t. stage of business comprehension but In practice there will be a lot of data needs much more work to turn it into that cannot be mastered: it may not be information that can be analyzed. as useful as mastered data, but it can still be valuable. Consistency is, however, essential if the result is to be of sufficient quality Iterative approach to enable adaptable and accurate Mastering data is not about achieving analysis. Therefore, it is important to perfection instantly: it’s about choose a technology platform that has improving quality and control over the capabilities and the scalability to time, in a way that creates incremental carry out the transformation in the value for the business. required timescale. Therefore, the mastering process is Not everything can be mastered usually an iterative one, with areas What is optimal from a standards prioritised for the POLE treatment perspective may well be suboptimal, or according to what analysis the impossible, from an operational policy business needs at a given time. This perspective. Forcing every individual iterative approach works well because to identify themselves by inputting 10 the organisation learns more about key attributes will probably reduce the big data techniques at each stage, and amount that you sell. can put its new knowledge to use in subsequent stages. To get the best out of big data, the key is for a business to understand both what it would ideally like to achieve and what is practical. If there is no 16Adding value − business analytics for big data Predictive analytics (PA) hold the key and then data about network to generating value from big data. behavior – the latter allows a company There are three ways in which big to correlate its own performance data can be used by PA: it can be with customer attrition. The recent used to enhance existing analytics, availability of social media data has create new analytics, and enable better provided detailed information about decisions: customer preferences. Enhancing existing analytics Applying new techniques, algorithms, Big data techniques make it and software to traditional data such possible both to examine additional as CDR makes it possible to explore dimensions and patterns in traditional additional dimensions of that data, as data, and to analyze it in conjunction does combining it with new data. For with new types of data. When instance, companies can use social applied to customer behavior, these network analysis (SNA) to find out techniques can be extremely useful in more about customer usage patterns, formulating strategy. identifying influencer/follower relationships in social networks that An illustration is provided by the may influence churn, and relating history of the telecom industry and its those to the CDR data they already attempts to predict and manage churn have. (see Figure 3). Text mining applied to unstructured Companies have traditionally relied data – such as call center notes, blogs, on analyzing customer data and or customer feedback on product information, together with websites – can further increase call detail record (CDR) data. To this insight into customer preferences and they have added contact center data, behavior. Analysing CDR data for “calling community” information Social media data Network data Network data Contact centre data Contact centre data Contact centre data CDR data- analysing CDR data- analysing CDR data- analysing CDR data- analysing customer calling data customer calling data customer calling data customer calling data for usage, frequency for usage, frequency for usage, frequency for usage, frequency Customer and Customer and Customer and Customer and product data product data product data product data Figure 3: Developments in analysing telecoms customer data 17 Increasing information value Big dataBusiness Analytics the way we see it Integrating big data techniques with frequently, it can be used to improve existing data mining and advanced business rules on the fly, so that those analytics can not only strengthen a in customer-facing roles (for example, company’s ability to predict churn, salespeople and contact center agents, but also build a retention strategy that or their virtual counterparts) can targets the right customer segments. achieve real-time decisioning. A similar approach can build and enhance marketing, cross-sell and Consider a credit decision by a up-sell models. contact center agent (or “virtual agent”). Traditional scoring methods Creating new analytics like customer segmentation may New types of analytics are becoming assign a high credit score to someone possible as a direct result of the who has just lost their job, for additional types of data now available. instance. Now, customers can be These analytics can help a company scored on the basis of recent payment to sell more products and provide a patterns, or additional credit taken up better service and overall experience in the past few days, or even today, for customers. and the results relayed to contact center agents. For example, supermarkets can tailor special offers for customers currently Big data analysis can also yield in the store, based not only on their competitive intelligence about other profile, preferences and purchase companies’ offers, which should history but also on the items they again result in better decisions by currently have in the trolley. (This can customer-facing personnel. For be identified by an intelligent trolley example, a television company’s call that scans barcodes or RFID tags.) For center agent talking to a customer example, if the customer has picked who is threatening to leave should up baby toiletries, a baby food coupon make different offers depending on could be offered. whether a better deal is available in the marketplace. Again, with the advent of smart meters, power companies will be Practicalities of analyzing big able to collect detailed data about data: need for collaboration customer energy consumption. To date, predictive analytics and They will be able to offer customers data mining have depended on a guidance about the best times to use combination of science and art. Data appliances, and can generate alerts analysis has tended to be iterative, to reduce wastage. They can also discovery-based, and not completely forecast usage with greater accuracy, automated. In a typical scenario, both for individual consumers and to analysts extract data from the data synchronize their power generation warehouse into the analytical platform supply processes. where they perform data discovery, model development, and model Enabling better decisions evaluation. The final model is handed Big data technology enables analytics back to the IT team to implement to be implemented in (near) real time. and integrate it into various BI and If modeling data is collected more operational applications. 18IT and analytics teams now need but also eliminates the time taken for to change their work processes. It data extraction and loading. is only by integrating big data with existing enterprise data that you can Making sure big data really adds get the “whole picture”, and that value requires collaboration. IT people Analyzing big data will cost money, must become more adaptable in and so it is important to measure the their role as custodians of data and benefit gained. To do that accurately, processing power/memory in order you need to start from a reliable to enable a new, less structured, and baseline: for example, the telecom more discovery-oriented style of company seeking to reduce churn analytics. At the same time, predictive needs to have a reliable picture analytics teams need to harness of current churn before it starts, their inventiveness so that the new together with a measurement of the results and models can be seamlessly accuracy of its existing churn model. integrated with business processes By continuing to measure churn, the and applications. organization can create a feedback loop, evaluating the success of the If the functions fail to collaborate, new techniques and changing them as organizations will not get the whole necessary. picture and therefore they are likely to miss out on some of the intended Strategic perspective benefits. Most of all, it is important to approach this type of analytics Speeding up response from a strategic perspective, rather Prevalent techniques tend to rely than starting with the technology on analyzing historical data, while or the data. The strategy should be many applications of big data need developed to suit the enterprise as predictive analytics to access current a whole, to avoid conflict between, information, and to provide insights or duplication of effort by, sales, or modify models in (near) real time. marketing, and so on. This necessitates new PA techniques that are in some sense “self-correcting” Decide what you need to find out or “self-learning”, such as Bayesian first, and then work out whether, methods or machine learning and how, big data can help you do algorithms. it. A small pilot to find out what is achievable is often a key first step to There are also ways to speed up realizing value from big data. traditional analytics using newer methods such as in-database or in-memory analytics. Here, certain phases of the model development lifecycle, such as data visualization and recovery, are (within limits) pushed into the database or memory instead of being performed on the analytical platform. This not only makes the analytical process faster, 19 BIM the way we see it 20Making the most of social media The social media explosion has seen “likes” can be aggregated to provide consumers publishing their lives insight into how customers feel online via tweets, photo uploads, about products, and can be tied to a “likes”, status updates, and so on. structured piece of data like a digital Much of this material is unstructured, asset. has limited relevance to brands and exists in vast quantities, making it Work out what is relevant challenging to get value from the data. To know which items in the sea of In working with social media, many available data are relevant to your organizations are encountering these business, you need to integrate SMM big data challenges for the first time. with other data-related activities. If you are applying text analytics to Companies that have invested in social media data, then why not use a social media monitoring (SMM) it on other unstructured “verbatim” tool tend to be disappointed with sources such as surveys and call the results. A common complaint is center notes? inaccuracy. Many SMM tools have access to only a fraction of available Add structured data to the mix and data, and struggle to classify content you are positioned to identify, for into (for example) positive, negative, example, whether a spike in social neutral, or mixed categories. To media complaints is a sign of a new compensate, organizations (or their problem or relates to a known one. agencies) often rely on humans to Once again, integrating different types classify content manually – hardly a of data together to get the whole scalable approach. picture is the key to value. So what is the best way to obtain Human intervention is still required, business advantage from social and must be fast enough to keep media? We describe three important up with social media storms or viral steps below: improve accuracy, work campaigns. For B2C organizations out what is relevant, and focus on where reputation is vital, the aim action. We then discuss the need should be to create something like to incorporate social media into an an air traffic control room, with overall data strategy. screens displaying and blending data from multiple sources to allow fast, Improve accuracy informed decisions on what action is 100% accuracy is not achievable, but needed: product recall, competitive there are ways to improve on SMM campaign. tools – for example with natural language processing. Some human Focus on action intervention will almost certainly Insights must generate actions. be required, for example to “train” Empower your “air traffic controllers” the software to filter out noise and to make decisions, and make sure interpret slang and sarcasm. those decisions are relayed to the parts of the organization that must act Semi-structured sources of social on them: customer service, product data are easier to analyze accurately. development, sales and so on. Numerical product ratings and 21

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.