Big Data Analytics and Ethnography
Advanced analytics techniques and Big Data Analytics, transform all of the data into meaningful information that can be explored at various levels of the organization or society. This blog explains what is Big Data Analytics and Ethnography with examples.
Using advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, and natural language processing, analysts, researchers, and business users can analyze unusable data to gain new insights resulting in better and faster decisions.
The Hype Around Big Data and Big Data Analytics
Is it possible to predict whether a person will get some disease 24 h before any symptoms are visible?
Is it possible to predict future virus hotspots?
Is it possible to predict when or where fraud or crime will take place before it actually happens?
It is possible to predict traffic congestion for up to three hours in advance?
Is it possible to predict terrorists’ future moves?
Can we support in a better way the wellbeing of people?
On the one hand, were it not for the possibility to collect large amounts of data being generated, the development of advanced analytics techniques would be irrelevant. On the other hand, the availability of huge data would mean nothing without advanced analytics techniques to analyze it.
It is useful to distinguish among three different types of analytics: descriptive, predictive, and prescriptive. It should be noted that today, nonetheless, most of the efforts are directed towards predictive analytics.
It helps to summarize and describe the past and is useful as it can shed light on behaviors that can be further analyzed to understand how they might influence future results. For example, descriptive statistics can be used to show the average pounds spent per household or the total number of vehicles in inventory.
Predictive analytics help answer the question; What will or could happen? It uses statistical models and forecasts techniques (regression analysis, machine learning, neural networks, golden path analysis, and so on).
It helps to understand the future by means of providing estimates about the likelihood of a future outcome. For example, predictive analytics can be used to forecast customer purchasing patterns or customer behavior.
Prescriptive analytics help answer the question: How can we make it happen? Or what should we do? It uses optimization and simulation algorithms (algorithms, machine learning, and computational modeling procedure, among others).
It helps to advise on possible outcomes before decisions are made by means of quantifying the effect of future decisions. For example, prescriptive analytics can be used to optimize production or customer experience.
Despite the increased interest in exploring the benefits of Big Data emerging from performing descriptive, predictive, and/or prescriptive analytics, however, researchers and practitioners alike have not yet agreed on a unique definition of the concept of Big Data. We have performed a Google search for the most common terms associated with Big Data.
The fact that there is no unique definition of Big Data is not necessarily bad since this allows the possibility to explore facets of Big Data that may otherwise be constrained by a definition;
But, at the same time, it is surprising if we consider that what was once considered to be a problem, that is, the collection, storage, and processing of a large amount of data, today is not an issue anymore; as a matter of fact, advanced analytics technologies are constantly being developed, updated, and used.
In other words, we have the IT technology to support the finding of innovative insights from Big Data, while we lack a unified definition of Big Data. The truth is that indeed, this is not a problem. We may not all agree on what Big Data is, but we do all agree that Big Data exists.
And grows exponentially. And this is sufficient because what this means is that we now have more information than ever before and we have the technology to perform analyses that could not be done before when we had smaller amounts of data.
One of the things they were able to determine was what customers purchase the most ahead of a storm. And the answer was Pop-Tarts, a sugary pastry that requires no heating, lasts for an incredibly long period of time, and can be eaten at any meal.
Einav and Levin, captured this view when they elegantly stated that Big Data’s potential comes from the “identification of novel patterns in behavior or activity, and the development of predictive models, that would have been hard or impossible with smaller samples, fewer variables, or more aggregation”.
Concomitantly, it is also true that despite the growth of the field, Big Data Analytics is still in its incipient stage and comprehensive predictive models that tie together knowledge, human judgment and interpretation, commitment, common sense, and ethical values, are yet to be developed.
And this is one of the main challenges and opportunities of our times. The true potential of Big Data is yet to be discovered.
What Is Big Data?
The term ‘Big Data’ was initially used in 1997 by Michael Cox and David Ellsworth, to explain both the data visualization and the challenges it posed for computer systems.
Today, besides the required IT technology, our ability to generate data has increased dramatically—as mentioned previously, we have more information than ever before;
but what has really changed is that we can now analyze and interpret the data in ways that could not be done before when we had smaller amounts of data. And this means that Big Data has the potential to revolutionize essentially any area of knowledge and any aspect of our life.
Big Data has received many definitions and interpretations over time and a unique definition has not been yet reached, as indicated previously. Today, it is customary to define Big Data in terms of data characteristics or dimensions, often with names starting with the letter ‘V’. The following four dimensions are among the most often encountered:
Volume: It refers to a large amount of data created every day globally, which includes both simple and complex analytics and which poses the challenge of not just storing it, but also analyzing it.
Velocity: It refers to the speed at which new data is generated as compared to the time window needed to translate it into intelligent decisions.
It is without a doubt that in some cases, the speed of data creation is more important than the volume of the data; IBM considered this aspect when they stated that “for time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value”.
Real-time processing is also essential for businesses looking to obtain a competitive advantage over their competitors, for example, the possibility to estimate the retailers’ sales on a critical day of the year, such as Christmas.
Variety: It encapsulates the increasingly different types of data, structured, semi-structured, and unstructured, from diverse data sources (e.g., web, video and audio data, sensor data, financial data, and transactional applications, log files and clickstreams, GPS signals from cell phones, social media feeds, and so on), and in different sizes from terabytes to zettabytes.
One of the biggest challenges is posed by unstructured data. Unstructured data is a fundamental concept in Big Data and it refers to data that has no rules attached to it, such as a picture or a voice recording. The challenge is how to use advanced analytics to make sense of it.
Veracity: It refers to the trustworthiness of the data. In its 2012 Report, IBM showed that 1 in 3 business leaders don’t trust the information they use to make decisions. One of the reasons for such a phenomenon is that there are inherent discrepancies in the data, most of which emerge from the existence of unstructured data.
This is even more interesting if we consider that, today, most of the data is unstructured. Another reason is the presence of inaccuracies. Inaccuracies can be due to the data being intrinsically inaccurate or from the data becoming inaccurate through processing errors.
Building upon the 4 Vs:
Big Data is essentially about the phenomenon that we are trying to record and the hidden patterns and complexities of the data that we attempt to unpack.
With this view, the authors advanced an expanded model of Big Data, wherein they included three additional dimensions, namely the 3 Cs: Context, Connectedness, and Complexity.
The authors stated that understanding the Context is essential when dealing with Big Data, because “raw data could mean anything without a thorough understanding of the context that explains it”;
Connectedness was defined as the ability to understand Big Data in its wider Context and within its ethical implications; and Complexity was defined from the perspective of having the skills to survive and thrive in the face of complex data, by means of being able to identify the key data and differentiate the information that truly has an impact on the organization.
Presenting them all goes beyond the scope of the present blog, but we hope to have provided a flavor of the various dimensions of Big Data. Having, thus, highlighted these, along with the existing debate surrounding the very definition of Big Data, we will now move towards presenting an overview of the social value that Big Data can offer.
The Social Value of Big Data
Value creation in a Big Data perspective includes both the traditional economic dimension of value and the social dimension of value. Today, we are yet to fully understand how organizations actually translate the potential of Big Data into the said value.
Generally, however, stories of Big Data’s successes have tended to come from the private sector, and less is known about its impact on social organizations. Big Data can, nonetheless, drive big social change, in fields such as education, healthcare, and public safety and security, just to mention a few.
Furthermore, the social value can be materialized as employment growth, increased productivity, increased consumer surplus, new products, and services, new markets and better marketing, and so on.
Governments, for instance, can use big data to, “enhance transparency, increase citizen engagement in public affairs, prevent fraud and crime, improve national security, and support the well-being of people through better education and healthcare”.
An in-depth systematic review of Information Systems literature on the topic and identified two socio-technical features of Big Data that influence value realization: portability and interconnectivity.
The authors further argue that, in practice, “organizations need to continuously realign work practices, organizational models, and stakeholder interests in order to reap the benefits from big data”.
As previously mentioned, the value that Big Data Analytics can unleash is great, but we are yet to fully understand the extent of the benefits. Empirical studies that consider the creation of value from Big Data Analytics are nowadays growing in number, but are still rather scarce. We, thus, join the call for further studies in the area.
In the following lines, we will proceed to explore how Big Data can inform social change and to this aim, we present some of the advancements made in three different sectors.
It should be noted that the information provided is not exhaustive, as our main intention is to provide a flavor of the opportunities brought about by the Big Data age.
Good… but What About the Bad?
Any given technology is argued to have a dual nature, bringing both positive and negative effects that we should be aware of. Below we briefly present two of the latter effects.
In the context of Big Data and advanced analytics, a negative aspect, which also represents one of the most sensitive and worrisome issues, is the privacy of personal information.
When security is breached, privacy may be compromised and loss of privacy can, in turn, result in other harms, such as identity theft and cyberbullying or cyberstalking. “
There is great public fear about the inappropriate use of personal data, particularly through the linking of data from multiple sources. Managing privacy is effectively both a technical and a sociological problem, and it must be addressed jointly from both perspectives to realize the promise of big data”.
In the age of Big Data, there is a necessity to create new principles and regulations to cover the area of privacy of information, although who exactly should create these new principles and regulations is a rather sensitive question.
On the one hand, behavioral economists would argue that humans are biased decision-makers, which would support the idea of automation. But on the other hand, what happens when individuals gain the skills necessary to use automation but know very little about the underlying assumptions and knowledge domain that make automation possible?
It is without much doubt that The Bad or dark side of the Big Data age cannot be ignored and should not be treated with less importance than it merits, but more in-depth research is needed to explore and gain a full understanding of its negative implications and how these could be prevented, diminished, or corrected.
Today, there are still many debates surrounding Big Data and one of the most prolific ones involves the questioning of the very existence of Big Data, with arguments in favor of or against Big Data.
But this is more than counter-productive. Big Data is here to say. In a way, the Big Data age can be compared to the transition from the Stone Age to the Iron Age: it is simply the next step in the evolution of the human civilization and it is, quite frankly, irreversible.
And just because we argue over its presence does not mean it will disappear. The best we can do is accept its existence as the natural course of affairs and instead concentrate all our efforts to chisel its path forward in such a way so as to serve a Greater Good.
We conclude by re-stating that although much has been achieved until the present time, the challenge remains the insightful interpretation of the data and the usage of the knowledge obtained for the purposes of generating the most economic and social value.
We join the observation made by other researchers according to which many more research studies are needed to fully understand and fully unlock the societal value of Big Data.
Until then, we will most likely continue to live in a world wherein individuals and organizations alike collect massive amounts of data with a ‘just in case we need it’ approach, trusting that one day, not too far away, we will come to crack the Big Data Code.
Big Data Analytics and Ethnography:
Ethnography is generally positioned as an approach that provides deep insights into human behavior, producing ‘thick data’ from small datasets, whereas big data analytics is considered to be an approach that offers ‘broad accounts’ based on large datasets. Although perceived as antagonistic, ethnography and big data analytics have in many ways, a shared purpose;
In this sense, this blog explores the intersection of the two approaches to analyzing data, with the aim of highlighting both their similarities and complementary nature.
Ultimately, this blog advances that ethnography and big data analytics can work together to provide a more comprehensive picture of big data, and can thus, generate more societal value together than each approach on its own.
For thousands of years and across many civilizations, people have been craving for knowing the future. From asking the Oracle to consulting the crystal ball to reading the tarot cards, these activities stand as examples that show how people have always sought any help that could tell them what the future held, information that would aid them to make better decisions in the present.
Today, the craving for such knowledge is still alive and the means to meet it is big data and big data analytics. From traffic congestion to natural disasters, from disease outbursts to terrorist attacks, from game results to human behavior, the general view is that there is nothing that big data analytics cannot predict. Indeed, the analysis of huge datasets has proven to have invaluable applications.
Big data analytics is one of today’s most famous technological breakthroughs that can enable organizations to analyze fast-growing immense volumes of varied datasets across a wide range of settings, in order to support evidence-based decision-making.
Over the past few years, the number of studies that have been dedicated to assessing the potential value of big data and big data analytics has been steadily increasing, which also reflects the increasing interest in the field.
Organizations worldwide have come to realize that in order to remain competitive or gain a competitive advantage over their counterparts, they need to be actively mining their datasets for newer and more powerful insights.
Big data can, thus, mean big money. But there seems to be at least one problem. A 2013 survey by the big data firm Infochimps, who looked at the responses from over 300 IT department staffers, indicated that 55% of big data projects do not get completed, with many others falling short of their objectives.
These statistics show that although big investments are taking place in big data projects, the generation of value does not match the expectations. The obvious question is, of course, why? Why are investments in big data failing, or in other words, why having the data is not sufficient to yield the expected results?
In 2014, Watson advanced that “the keys to success with big data analytics include a clear business need, strongly committed sponsorship, alignment between the business and IT strategies, a fact-based decision-making culture, a strong data infrastructure, the right analytical tools, and people skilled in the use of analytics”.
Although informative and without doubt useful, today nonetheless, these tips seem to be insufficient; otherwise stated, if we know what we need in order to succeed with big data analytics, then why don’t we succeed in creating full value?
The truth is that we are yet to profoundly understand how big data can be translated into economic and societal value and the sooner we recognize this shortcoming, the sooner we can find solutions to correct it.
In this blog, we advance that ethnography can support big data analytics in the generation of greater societal value. Although perceived to be in opposition, ethnography and big data analytics have much in common and in many ways, they have a shared purpose. In the following sections, we explore the intersection of the two approaches.
Ultimately, we advance that researchers can blend big data analytics and ethnography within a research setting; hence, that big data analytics and ethnography together can inform the greater good to a larger extent than each approach on its own.
What Is Big Data?
‘Big data’: a concept, a trend, a mindset, an era. No unique definition, but great potential to impact on essentially any area of our lives.
The term big data is generally understood in terms of the four Vs advanced by Gartner: volume, velocity, variety, and veracity. In time, the number of Vs has increased, reaching up to ten Vs.
Other authors have further expanded the scope of the definition, broadening the original framework: Charles and Gherman, for example, advocated for the inclusion of three Cs: context, connectedness, and complexity.
One of the most elegant and comprehensive definitions of big data can be found in the none other than the Oxford English Dictionary, which defines it as: “extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.”
Big data comes from various structured and unstructured sources, such as archives, media, business apps, public web, social media, machine log data, sensor data, and so on. Today, almost anything we can think of produces data and almost every data point can be captured and stored. Some would say: also, analyzed.
This may be true, but in view of the statistics presented in the introduction above, we are reticent to so state. Undoubtedly, data is being continuously analyzed for better and deeper insights, even as we speak.
But current analyses are incomplete since if were to be able to fully extract the knowledge and insights that the datasets hold, we would most probably be able to fully capitalize on their potential, and not so many big data projects would fail in the first place.
The big data era has brought many challenges with it, which deemed the traditional data processing application software unfit to deal with them.
These challenges include networking, capturing data, data storage and data analysis, search, sharing, transfer, visualization, querying, updating and, more recently, information privacy.
But the list is not exhaustive and challenges are not static; in fact, they are dynamic, constantly mutating and diversifying. One of the aspects that we seem to generally exclude from this list of challenges is human behavior.
Maybe it is not too bold to say that one of the biggest challenges in the big data age is the extraction of insightful information not from the existing data, but from the data originating from emergent human dynamics that either hasn’t happened yet or that would hardly be traceable through big data.
One famous example in this regard is Nokia, a company that in the 1990s and part of 2000s was one of the largest mobile phone companies in the world, holding by 2007 a market share of 80% in the smartphone market.
Nevertheless, Nokia’s over-dependence on quantitative data has led the company to fail in maintaining its dominance on the mobile handset market.
In a post published in 2016, technology ethnographer Tricia Wang describes how she conducted ethnographic research for Nokia in 2009 in China, which revealed that low-income consumers were willing to pay for more expensive smartphones; this was a great insight at the time that led her to conclude that Nokia should replace their then strategy from making smartphones for elite users to making smartphone for low-income users, as well.
But Nokia considered that Wang’s sample size of 100 was too small to be reliable and that moreover, her conclusion was not supported by the large datasets that Nokia pofefed; they, thus, did not implement the insight. Nokia was bought by Microsoft in 2013 and Wang concluded that:
There are many reasons for Nokia’s downfall, but one of the biggest reasons that I witnessed in person was that the company over-relied on numbers.
They put a higher value on quantitative data, they didn’t know how to handle data that wasn’t easily measurable, and that didn’t show up in existing reports. What could’ve been their competitive intelligence ended up being their eventual downfall?
Netflix is at the other end of the game, illustrating how ethnographic insights can be used to strengthen a company’s position on the market. Without a doubt, Netflix is a data-driven company, just like Nokia. In fact, Netflix pays quite a lot of attention to analytics to gain insight into their customers.
In 2006, Netflix launched the Netflix Prize competition, which would reward with $1 million the creation of an algorithm that would “substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences”.
But at the same time, Netflix was open to learning from more qualitative and contextual data about what users really wanted. In 2013, cultural anthropologist Grant McCracken conducted ethnographic research for Netflix and what he found was that users really enjoyed watching blog after blog of the same series, engaging in a new form of consumption, now famously called binge watching.
A survey conducted in the same year among 1500 TV streamers (online U.S. adults who stream TV shows at least once per week) confirmed that people did not feel guilty about binge-watching, with 73% of respondents actually feeling good about it.
This new insight was used by Netflix to re-design its strategy and release whole seasons at once, instead of releasing one episode per week. This, in turn, changed the way users consume media and specifically Netflix’s products and how they perceived the Netflix brand while improving Netflix’s business.
What Is Ethnography?
let us now consider the concept of ethnography and discuss it further. Ethnography, from the Greek words ethnos, meaning ‘folk, people, the nation’, and graphs, meaning ‘I write’ or ‘writing’, is the systematic study of people and cultures, aimed at understanding and making sense of social meanings, customs, rituals, and everyday practices.
For example, ethnography defined as:
The study of people in naturally occurring settings or ‘fields’ by methods of data collection which capture their social meanings and ordinary activities, involving the researchers participating directly in the setting, if not also the activities, in order to collect data in a systematic manner but without meaning being imposed on them externally.
Participant observation, ethnography, and fieldwork are all used interchangeably… they can all mean spending long periods watching people, coupled with talking to them about what they are doing, thinking and saying, designed to see how they understand their world.
Ethnography is about becoming part of the settings under study. “Ethnographies are based on observational work in particular settings”, allowing researchers to “see things as those involved see things”; “to grasp the native’s point of view, his relation to life, to realize his vision of his world”.
Consequently, the writing of ethnographies is viewed as an endeavor to describe ‘reality’, as this is being experienced by the people who live it.
Ethnography depends greatly on fieldwork. Generally, data is collected through participant or nonparticipant observation. The primary data collection technique used by ethnographers is, nonetheless, participant observation, wherein the researchers assume an insider role, living as much as possible with the people they investigate.
Participant observers interact with the people they study, they listen to what they say and watch what they do; otherwise stated, they focus on people’s doings in their natural setting, in a journey of discovery of everyday life.
Nonparticipant observation, on the other hand, requires the researchers to adopt a more ‘detached’ position. The two techniques differ, thus, from one another based on the weight assigned to the activities of ‘participating’ and ‘observing’.
Finally, ethnography aims to be a holistic approach to the study of cultural systems, providing ‘the big picture’ and depicting the intertwining between relationships and processes; hence, it usually requires a long-term commitment and dedication.
In today’s fast-paced environment, however, mini-ethnographies are also possible. A mini-ethnography focuses on a specific phenomenon of interest and as such, it occurs in a much shorter period of time than that required by a full-scale ethnography.
Big Data Analytics and Ethnography: Points of Intersection
Ford advanced that “data scientists and ethnographers have much in common, that their skills are complementary, and that discovering the data together rather than compartmentalizing research activities was key to their success”.
In a more recent study postulated that “ethnographic observations can be used to contextualize the computational analysis of large datasets, while computational analysis can be applied to validate and generalize the findings made through ethnography”.
The latter further proposed a new approach to studying social interaction in an online setting, called big-data-augmented-ethnography, wherein they integrated ethnography with computational data collection.
To the best of our knowledge, the literature exploring the commonalities between big data analytics and ethnography is quite limited. In what follows, we attempt, thus, to contribute to the general discussion on the topic, aiming to highlight additional points of intersection.
Big data analytics comprise the skills and technologies for continuous iterative exploration and investigation of past events to gain insight into what has happened and what is likely to happen in the future. In this sense, data scientists develop and work with models.
Models, nonetheless, are simplified versions of reality. Models built aim, thus, to represent the reality and in this sense, are continuously revised, checked, and improved upon and, furthermore, tested to account for the extent to which they actually do so.
On the other hand, ethnographies are conducted in a naturalistic setting in which real people live, with the writing of ethnographies being viewed as an endeavor to describe reality.
Furthermore, just as big data analytics-informed models are continuously being revised, “ethnography entails continual observations, asking questions, making inferences, and continuing these processes until those questions have been answered with the greatest emic validity possible”.
In other words, both big data and ethnography are more concerned with the processes through which ‘reality’ is depicted rather than with judging the ‘content’ of such reality.
Changing the Definition of Knowledge
Both big data analytics and ethnography change the definition of knowledge and this is because both look for a more accurate representation of reality.
On the one hand, big data has created a fundamental shift in how we think about research and how we define knowledge, reframing questions about the nature and the categorization of reality and having a profound change at the levels of epistemology and ethics.
On the other hand, ethnographies aim to provide a detailed description of the phenomena under study, and as such, they may reveal that people’s reported behavior does not necessarily match their observed behavior.
As a quote widely attributed to the famous anthropologist Margaret Mead states: “What people say, what people do, and what they say they do are entirely different things”.
Ethnographies can and are generally performed exactly because they can provide insights that could lead to new hypotheses or revisions of existing theory or understanding of social life.
Searching for Patterns
Both data scientists and ethnographers collect and work with a great deal of data and their job is, fundamentally, to identify patterns in that data.
On the one hand, some say that the actual value of big data rests in helping organizations find patterns in data, which can further be converted into smart business insights.
Big data analytics or machine learning techniques help find hidden patterns and trends in big datasets, with a concern more towards the revelation of solid statistical relationships. Generally, this means finding out whether two or more variables are related or associated.
Ethnography, on the other hand, literally means to ‘write about a culture’ and in the course of so doing, it provides think descriptions of the phenomena under study, trying to make sense of what is going on and reveal understandings and meanings.
By carefully observing and/or participating in the lives of those under study, ethnographers thus look for shared and predictable patterns in the lived human experiences: patterns of behavior, beliefs, and customs, practices, and language.
Aiming to Predict
A common application of big data analytics includes the study of data with the aim to predict and improve. The purpose of predictive analytics is to measure precisely the impact that a specific phenomenon has on people and to predict the chances of being able to duplicate that impact on future activities.
In other words, identifying patterns in the data is generally used to build predictive models that will aid in the optimization of a certain outcome.
On the other hand, it is generally thought that ethnography is, at its core, descriptive. But this is somehow misunderstood.
Today, there is a shift in an ethnographer’s aims, whose ethnographic analyses can take the shape of predictions. Evidence of this is Wang’s ethnographic research for Nokia and McCracken’s ethnographic research for Netflix.
In the end, in practical terms, the reason why we study a phenomenon, irrespective of the method of data collection or data analysis used, is not just because we want to understand it better, but because we also want to predict it better.
The identification of patterns enables predictions, and as we have already implied before, both big data analytics and ethnography can help in this regard.
Sensitive to Context
Big data analytics and ethnography are both context-sensitive; in other words, taken out of context, the insights obtained from both approaches will lose their meaning. On the one hand, big data analytics is not just about finding patterns in big data.
It is not sufficient to discover that one phenomenon correlates with another; or otherwise stated, there is a big difference between identifying correlations and actually discovering that one causes the other (cause and effect relationship). Context, meaning, and interpretation become a necessity and not a luxury.
Analytics often happens in a black box, offering up a response without context or clear transparency around the algorithmic rules that computed a judgment, answer, or decision. Analytics software and hardware are being sold as a single- source, easy solution to make sense of today’s digital complexity.
The promise of these solutions is that seemingly anyone can be an analytics guru. There is a real danger in this do-it-yourself approach to analytics, however.
As with all scientific instruments and approaches, whether it be statistics, a microscope, or even a thermometer, without proper knowledge of the tool, expertise in the approach, and knowledge of the rules that govern the process, the results will be questionable.
On the other hand, inferences made by ethnographers are tilted towards the explanation of phenomena and relationships observed within the study group.
Ethnography supports the research endeavors of understanding the multiple realities of life in context, emphasis added); hence, by definition, ethnography provides detailed snapshots of contextualized social realities. Generalization outside the group is limited and taken out of context, meanings will also be lost.
‘Learning’ from Smaller Data
Bigger data are not always better data. Generally, big data is understood as originating from multiple sources (the ‘variety’ dimension) or having to be integrated from multiple sources to obtain better insights.
In this sense, smaller data may be more appropriate for intensive, in-depth examination to identify patterns and phenomena, an area in which ethnography holds the crown.
Furthermore, data scientists have always been searching for new or improved ways to analyze large datasets to identify patterns, but one of the challenges encountered has been that they need to know what they are looking for in order to find it, something that is particularly difficult when the purpose is to study emergent human dynamics that haven’t happened yet or that will not show up that easily in the datasets.
Both big data analytics and ethnography can, thus, learn from smaller datasets (or even single case analyses).
On the other hand, we must acknowledge that there are also situations in which researchers generally rely on big data (such as clinical research), but sometimes they have to rely on surprisingly small datasets; this is the case of, for example, clinical drug research that analyses the data obtained after drugs are released on the market.
Presence of Behavioural Features
This has generally been presented as a difficulty that ethnographers, in particular, must fight to overcome. But it does not have to be so in the big data age. The truth is we need as many perspectives and as many insights as possible.
Fear that we might go wrong in our interpretations will only stop progression. A solution to this is posed by the collaboration between data scientists and ethnographers.
In this sense, ethnographers should be allowed to investigate the complexity of the data and come out with propositions and hypotheses, even when they are conflicting; and then data scientists could use big data analytics to test those propositions and hypotheses in light of statistical analyses and see if they hold across the larger datasets.
Studying human behaviour is not easy, but the truth is that both big data analytics and ethnography have a behavioral feature attached to them, in the sense that they are both interested in analyzing the content and meaning of human behavior; the ‘proximity’ between the two approaches is even more evident if we consider the change that the big data age has brought with it.
While in the not so far past, analytics would generally be performed by means of relying upon written artifacts which recorded past human behavior, today, big data technology enables the recording of current human behavior, as this happens (consider live feed data, for example). And as technology will keep evolving, the necessity of ‘collaboration’ between big data analytics and ethnography will become more obvious.
Unpacking Unstructured Data
Big data includes both structured (e.g., databases, CRM systems, sales data, sensor data, and so on) and unstructured data (e.g., emails, videos, audio files, phone records, social media messages, weblogs, and so on). According to a report by Cisco, an estimated 90% of the existing data is either semi-structured or unstructured.
Furthermore, a growing proportion of unstructured data is video. And video constituted approx. 70% of all Internet traffic in 2013. One of the main challenges of big data analytics is just how to analyze all these unstructured data. Ethnography (and its newer addition, online ethnography) may have the answer.
Ethnography is a great tool to ‘unpack’ unstructured data. Ethnography involves an inductive and iterative research process, wherein data collection and analysis can happen simultaneously, without the need to have gathered all the data or even look at the entire data.
Ethnography does not follow a linear trajectory, and this is actually an advantage that big data analytics can capitalize on. Ethnography is par excellence a very good approach to look into unstructured data and generate hypotheses that can be further tested against the entire datasets.
This could then inform and guide the overall strategy of massive data collection. Big data analytics could then analyze the data collected, testing the hypotheses proposed against these larger datasets.
In this sense, ethnography can help shed light on the complexities of big data, with ethnographic insights serving as input for big data analytics and big data analytics can be used to generalize the findings.
Employing an ethnographic approach is generally understood in a traditional sense, which is that of having to undertake long observations from within the organization, with the researcher actually having to become an insider, a part of the organization or context that he decides to study.
The good news is that today, new methods of ethnography are emerging, such as virtual ethnography which may turn out to be of great help in saving time and tackling the usual problem of having to gain access to the organization.
The virtual world is now in its exponential growth phase and doing virtual ethnography may just be one of the best, also convenient answers to be able to explore and benefit from understanding these new online contexts.
The web-based ethnographic techniques imply conducting virtual participant observation via interactions in online platforms such as social networks (such as Facebook or Twitter), blogs, discussion forums, and chat rooms. Conducting ethnographies in today’s world may, thus, be easier than it seems.
In this blog, we have aimed to discuss the points of intersection between big data analytics and ethnography, highlighting both their similarities and complementary nature.
Although the list is far from being exhaustive, we hope to have contributed to the discussions that focus on how the two approaches can work together to provide a more comprehensive picture of big data.
Ethnographers bring considerable skills to the table to contextualize and make greater meaning of analytics, while analytics and algorithms are presenting a new field site and complementary datasets for ethnographers”.
One of the most important advantages of combining big data analytics and ethnography is that this ‘intersection’ can provide a better sense of the realities of the contexts researched, instead of treating them as abstract, reified entities.
And this better sense can translate into better understandings and better predictions, which can further assist in the creation of better practical solutions, with greater societal added value.
There are indeed many points of intersection between big data analytics and ethnography, having in many ways a shared purpose. They are also complementary, as data scientists working with quantitative methods could supplement their own ‘hard’ methodological techniques with findings and insights obtained from ethnographies. As Goodall stated:
Ethnography is not the result of a noetic experience in your backyard, nor is it a magic gift that some people have and others don’t. It is the result of a lot of reading, a disciplined imagination, hard work in the field and in front of a computer, and solid research skills…
Today, we continue to live in a world that is being influenced by a quantification bias, the unconscious belief of valuing the measurable over the immeasurable. We believe it is important we understood that big data analytics has never been one size fits all. We mentioned in this blog that many big data projects fail, despite the enormous investments that they absorb.
This is in part because many people still fail to comprehend that a deep understanding of the context in which a pattern emerges is not an option, but a must. Just because two variables are correlated does not necessarily mean that there are a cause and effect relationship taking place between them.
Too often, Big Data enables the practice of apophenia: seeing patterns where none actually exist, simply because enormous quantities of data can offer connections that radiate in all directions.
Ethnography can provide that so very necessary deep understanding. In this sense, big data analytics and ethnography can work together, complementing each other and helping in the successful handcrafting and implementation of bigger projects for a bigger, greater good.
Big Data: A Global Overview
More and more, society is learning how to live in a digital world that is becoming engulfed in data. Companies and organizations need to manage and deal with their data growth in a way that compliments the data getting bigger, faster and exponentially more voluminous. They must also learn to deal with data in new and different unstructured forms.
This phenomenon is called Big Data. This blog aims to present other definitions for Big Data, as well as technologies, analysis techniques, issues, challenges and trends related to Big Data. It also looks at the role and profile of the Data Scientist, in reference to functionality, academic background, and required skills.
The result is a global overview of what Big Data is, and how this new form is leading the world towards a new way of social construction, consumption, and processes.
The Origins of Big Data and How It Is Defined
From an evolutionary perspective, Big Data is not new. The advance towards Big Data is a continuation of ancient humanity’s search for measuring, recording and analyzing the world. A number of companies have been using their data and analytics for decades.
The most common and widespread definition for Big Data refers to the 3 Vs: volume, velocity, and variety. Originally, the 3Vs were pointed out by Doug Laney in 2001, in a Meta Group report. In this report, Laney identifies the 3Vs as future challenges in data management and is nowadays widely used to define Big Data.
Although the 3Vs are the most solid definition for Big Data, they are definitely not the only one. Many authors have attempted to define and explain Big Data under a number of perspectives, going through more detailed definitions—including technologies and data analysis techniques, the use and goals of Big Data and also the transformations it is imposing within industries, services, and lives.
The expression Big Data and Analytics have become synonymous with Business Intelligence (BI) among some suppliers and for others, Big Data and Analytics was an incorporation of the traditional BI but with the addition of new elements such as predictive analyses, data mining, operation tools/approaches and also research and science.
Reyes defines Big Data as the process of assembling, analyzing and reporting data and information.
Big Data and Big Data and Analytics are often described as data sets and analytical techniques in voluminous and complex applications that require storage, management, analysis, and unique and specific visualization technologies. They also include autonomous data sources with distributed and decentralized controls.
Big Data has also been used to describe a large availability of digital data and financial transactions, social networks, and data generated by smartphones. It includes non-structured data with the need for real-time analysis.
Although one of Big Data’s main characteristics is the data volume, the size of data must be relative, depending on the available resources as well as the type of data that is being processed.
Mayer-Schonberger and Cukier believe that Big Data refers to the extraction of new ideas and new ways to generate value in order to change markets, organizations, the relationship between citizens and government, and so on. It also refers to the ability of an organization to obtain information in new ways, aiming to generate useful ideas and significant services.
Although the 3 Vs’ characteristics are intensely present in Big Data definitions throughout literature, its concept gained wider meaning. A number of characteristics are related to Big Data, in terms of data source, technologies and analysis techniques, goals and generation of value.
In summary, Big Data is enormous datasets composed by both structured and non-structured data, often with the need for real-time analysis and use of complex technologies and applications to store, process, analyze and visualize information from multiple sources. It plays a paramount role in the decision-making process in the value chain within organizations.
Big Data promises to fulfill the research principles of information systems, which is to provide the right information for the right use, in the precise volume and quality at the right time.
The goal of BI&A is to generate new knowledge (insights) that can be significant, often in real-time, complementing traditional statistics research and data source’s files that remain permanently static.
Big Data can make organizations more efficient through improvements in their operations, facilitating innovation and adaptability and optimizing resource allocation.
The ability of crossing and relating private data about products and consumer preferences with information from tweets, blogs, product analysis, and social network data, open various possibilities for companies to analyze and understand the preferences and needs of the customers, predict demand and optimize resources.
The key to extracting value from Big Data is the use of Analytics since the collection and storage themselves add little value. Data needs to be analyzed and its results used by decision makers and organizational process.
The emergence of Big Data is creating a new generation of data for decision support and management and is launching a new area of practice and study called Data Science.
It encompasses techniques, tools, technologies, and processes to extract reason out of Big Data. Data Science refers to qualitative and quantitative applications to solve relevant problems and predict outputs.
There are a number of areas that can be impacted by the use of Big Data. Some of them include business, sciences, engineering, education, health, and society.
Within education, some examples of Big Data application are tertiary education management and institutional applications (including recruitment and admission processes), financial planning, donator tracking, and monitoring student performance.
What Is Big Data Transforming in the Data Analysis Sector?
Three big changes that are: The first of them is that the need for samples was due to a time where information was something limited.
The second is that the obsession for correct data and the concern for the quality of the data were due to the short availability of data. The last is the abandonment of the search for causality and contentment and to shift focus to the discovery of the fact itself.
For the first big change, the argument is based on the Big Data definition itself, meaning in relative terms and not absolute. It was unviable and expensive to study a whole universe and is reinforced by the fact that nowadays some companies collect as much data as possible.
The second big change refers to the obsession for correct data, which adds to the first change: data availability. Before there was limited data, so it was very important to ensure the total quality of the data.
The increase of data availability opened the doors to inaccuracy and Big Data transforms the numbers into something more probabilistic than precise. That is, the larger the scale, more accuracy is lost.
Finally, the third big change in the Big Data era is that the predictions based on correlations are in Big Data’s defense. That means that Big Data launches noncausal analyses in a way to transform the way the world is understood. The mentality has changed on how data could be used.
The three changes described above turn some traditional perspectives of data analysis upside down, concerning not only the need for sampling or data quality but also integrity. It goes further when a new way to look at data and what information to extract from it is brought to the table.
Normally, Big Data refers to large amounts of complex data and the data is often generated in a continuous way, implying that the data analysis occurs in real-time. Classical analysis techniques are not enough and end up being replaced by learning machine techniques.
Big Data’s analysis techniques encompass various disciplines, which include statistics, data mining, machine learning, neural networks, social network analysis, sign processing, pattern recognition, optimization methods, and visualization approaches. In addition to new processing and data storage technologies, programming languages like Python and R gained importance.
Modeling decision methods also include discrete simulation, finite elements analysis, stochastic techniques, and genetic algorithms among others. Real-time modeling is not only concerned about time and algorithm output, but it is also the type of work that requires additional research.
The opportunities for emerging analytical research can be classified into five critical technical areas: Big Data and Analytics, text analytics, web analytics, social network analytics, and mobile analytics. Some sets of techniques receive special names, based on the way the data was obtained and the type of data to be analyzed, as follows:
Text Mining: techniques to extract information from textual data, which involves statistical analysis, machine learning, and linguistics;
Audio Analytics: non-structured audio data analyses, also known as speech analytics;
Video Analytics: encompasses a variety of techniques to monitor, analyze and extract significant information out of video transmissions;
Social Network Analytics: analysis of both structured and non-structured data from social networks;
Predictive Analytics: embraces a number of techniques to predict future results based on historical and current data and can be applied to most disciplines.
Besides the data analysis techniques, visualization techniques are also fundamental in this discipline. Big Data is a study of transforming data, information, and knowledge in an interactive visual representation.
Under the influence of Big Data’s technologies and techniques (large-scale data mining, time series analysis, and pattern mining), data like occurrences and logs can be captured in a low granularity with a long history and analyzed in multiple projections.
The analysis and exploration of datasets made analysis directed towards data (data-driven) possible and presents the potential to argue or even replace ad hoc analysis, for other types of analysis: consumer behavior tracking, simulations and scientific experiments and validation of hypothesis.
The Role and Profile of the Data Scientist
With the onset of the Big Data phenomenon, there emerges the need for skilled professionals to perform the various roles that the new approach requires: the Data Scientist. In order to understand the profile of this increasingly important professional, it is paramount to understand the role that they perform in Big Data.
Working with Big Data encompasses a set of different abilities from the ones organizations are used to. Because of that, it is necessary to pay attention to a key to success for this kind of project: people. Data Scientists are necessary for Big Data to make sense.
The managers of organizations need to learn what to do with new data sources. Some of them are willing to hire Data Scientists with high income to work in a magical way. First, they need to understand the Data Scientist’s purpose and why is it necessary to have someone playing this role.
The role of the Data Scientist is to discover patterns and relationships that have never been thought of or seen before. These findings must be transformed into information that can be used to take actions and generate value to the organization. Data Scientists are people that understand how to fish answers for important business questions given exorbitant non-structured information.
Data Scientists are highly trained and curious professionals with a taste for solving hard problems and a high level of education (often Ph.D.) in analytical areas such as statistics, operational research, computer science, and mathematics. Statistics and computing are together the main technologies of Data Science.
Data Science encompasses much more than algorithms and data mining. Successful Data Scientists must be able to visualize business problems from the data perspective. There is a thinking structure of data analysis and basic principles that must be understood.
The most basic universal ability of the Data Scientist is to write programming codes, although the most dominant characteristic of the Data Scientist is intense curiosity. Data Scientists are a hybrid of a hacker, analyst, communicator, and trust counselor.
Overall, it is necessary to think of Big Data not only in analytical terms but also in terms of developing high-level skills that enable the use of the new generation of IT tools and data to collect architectures.
Data must be collected from several sources, stored, organized, extracted, and analyzed in order to generate valuable findings. These discoveries must be shared with the main actors of the organization who are looking to generate competitive advantage.
Analytics is a complex process that demands people with a very specific educational specialization and this is why tools are fundamental to help people to execute tasks.
Tools and computer programming skills, including Python and R, knowledge in MapReduce and Hadoop to process large datasets; machine learning and a number of other visualization tools: Google Fusion Tables, Infogram, Many Eyes, Statwing, Tableau Public and DataHero.
Big Data intensifies the need for sophisticated statistics and analytical skills. With all their technical and analytical skills, Data Scientists are also required to have solid domain knowledge. In both contexts, the consistent time investment is required. In summary, Data Scientists need to gather a rich set of abilities, as follows:
Understand the different types of data and how they can be stored;
Communicate the findings through business reports.
For Data Science to work a team of professionals with different abilities is necessary and Data Science’s projects shall not be restricted to data experiments. Besides that, it is necessary to connect the Data Scientist to the world of the business expert. It is very common for Data Scientists to work close to people from the organization that have domain knowledge of the business.
Because of this, it is useful to consider analytical users on one side and data scientists and analysts on the other side. Each group needs to have different capabilities, including a mixture of business, data and analytical expertise. Analytical talents can be divided into three different types:
Specialists—that processes analytical models and algorithms, generate results and present the information in a way that organizational leaders can interpret and act;
Experts—which are in charge of developing sophisticated models and apply them to solve business questions;
Scientists—who lead the expert team/specialists and are in charge of constructing a story, creating innovative approaches to analyze data and producing solutions. Such solutions will be transformed into actions to support organizational strategies.
Having a very wide and yet specific profile, a mixture of technique and business knowledge, the Data Scientist is a rare professional. The difficulty in finding people with the technical abilities to use Big Data tools has not gone unnoticed by the media. Because of all those requirements, Data Scientists are not only limited but also expensive.
Issues and Challenges
The different forms of data, ubiquity and dynamic nature of resources are a big challenge. In addition, the long reach of data, findings, access, processing, integration and physical world interpretation through data are also challenging tasks.
Several challenges and issues involving Big Data have arisen, not only in the context of technology or management issues but also legal matters.
The following issues will be discussed in this section: user privacy and security, risk of discrimination, data access and information sharing, data storage and processing capacity, analytical issues, skilled professionals, processes changing, marketing, the Internet of Things (IoT) and finally, technical challenges, which seems to be one of the issues with the most concern in the literature.
The first issue refers to privacy and security. Personal information combined with other data sources can infer other facts about one person that may be a secret or not wanted to be shared by the user.
User’s information is collected and used to add more value to a business or organization, many times without being aware that their personal data is being analyzed.
The privacy issue is particularly relevant since there is data sharing between industries and for investigative purposes. That goes against the principle of privacy, which refers to avoiding data utilization.
The advances in Big Data and Analytics provided tools to extract and correlate data, enabling privacy violations easier. Preventing data access is also important for security matters against cybernetic attacks and enabling criminals to know more about their target.
Besides privacy matters, Big Data applications may generate concerning ethical preoccupations like social injustice or even discriminatory procedures, such as removing job possibilities to certain people, health access or even changing the social and economic level in a particular group.
On one hand, a person can obtain advantages from predictive analysis yet someone else may be disadvantaged against.
Big Data used for law applications increase the chances that one person suffers consequences, without having the right to object or even further, without having the knowledge that they are being discriminated against.
Issues about data access and information sharing refer to the fact that data is used for precise decision making at the right time. For that, data needs to be available at the perfect time and in a complete manner.
These demands make the process of management and governance very complex, with the additional need to make this data available for government agencies in a specific pattern.
Another issue about Big Data refers to storage and processing. The storage capacity is not enough for the amount of data being produced: social media websites are the major contributors, as well as sensors.
Due to the big demand, outsourcing data to the cloud can be an option but loading all this data does not resolve the problem since Big Data needs to relate data and extract information. Besides the time of data uploading, data changes very rapidly, making it even harder to upload data in real-time.
Analytical challenges are also posted in the Big Data context. A few questions need an answer: What if the volume is so big that is not known how to deal with it? Does all data need to be stored? Does all data need to be analyzed? How to figure out what are the most relevant points? How can data bring more advantages?
As seen in the last session, the Data Scientist profile is not easy. Required skills are still at an early stage. With emerging technologies, Data Science will have to be appealing to organizations and youth with a number of abilities.
These skills must not be limited to technical abilities but also must extend to research, analytics, data interpreting, and creativeness. These skills require training programs and the attention of universities to include Big Data in their courses.
The shortage of Data Scientists is becoming a serious limitation in some sectors. Universities and educational institutions must offer courses capable of providing all this knowledge for a new generation of Data Scientists.
University students should have enough technical abilities to conduct predictive analysis, statistical techniques knowledge, and handling tools available.
The Master of Science curriculum should have more emphasis on what concerns Business Intelligence and Business Analytics techniques and in application development, using high-level technology tools to solve important business problems. Another challenge here is how fast the universities can make their course updated with so many new technologies appearing every day.
Challenges referring to issues associated with change and the implementation of new processes and even business models, especially in reference to all the data made available through the internet and also in the marketing context.
The digital revolution in society and marketing has created huge challenges for companies, which encompasses discussions on the effects of sales and business models, consequences of new digital channels and media with the prevailing data growth.
The four main challenges for marketing are:
The use of customer’s insights and data to compete in an efficient way
The Power of social media for brands and customer relationships
New digital metrics and effective evaluation of digital marketing activities
The growing gap of talents with analytical capabilities in the companies.
In the context of the IoT or Internet of Everything (IoE), the following challenges are posed:
Learn the maturity capacity in terms of technologies and IT;
Understand the different types of functionalities of IoT, that can be incorporated and how it will impact the value of the client;
Comprehend the role of machine learning and predictive analytical models;
Rethink business models and the value chain, based on the velocity of market change and relative responsiveness of the competition.
Finally, the technical challenges refer to error tolerance, scalability, data quality, and the need for new platforms and tools. With the arrival of the new technology, an error must be acceptable or the task must be restarted. Some of the methods of Big Data computing tend to increase the error tolerance and reduce the efforts to restart a certain task.
The scalability issue already took computing to the cloud, which aggregates loads of work with varied performance in large groups, requiring a high level of resource sharing. These factors combine to bring a new concern of how to program, even in complex tasks of machine learning.
Collecting and storing a massive amount of data come at a price. As more data drives the decision making or predictive analysis in the business, this will lead to better results.
That generates some issues regarding relevant, quantity, data precision and obtained conclusions. The issue of data origin is another challenge, as Big Data allows data collection from different sources to make data validation hard.
New tools and analytical platforms are required to solve complex optimization problems, to support the visualization of large sets of data and how they relate to each other and to explore and automate multifaceted decisions in real-time.
Some new modern laws of data protection make it possible for a person to find out which information is being stored, but everyone should know when an organization is collecting data and with which purposes if it is going to be available to third parties and the consequences of not supplying the information.
Big Data has brought challenges in so many senses and also in so many unexpected ways. In this commotion, it is also an important issue for Big Data if the companies or organizations measuring and perceiving the return on investment (ROI) on its implementation.