What is data visualization why is it needed

what is data visualization and why is it important and how to use data visualization and how important is data visualization
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
Comment
4 data visualization Pictures make you smarter Now hold still… we want to get all the variables together in one shot. You need more than a table of numbers. Your data is brilliantly complex, with more variables than you can shake a stick at. Mulling over mounds and mounds of spreadsheets isn’t just boring; it can actually be a waste of your time. A clear, highly multivariate visualization can in a small space show you the forest that you’d miss for the trees if you were just looking at spreadsheets all the time. this is a new chapter 111On the CUTTING edge of fashion New Army Buy these shirts now Here’s Home Page 1 This is their control, because it’s the stylesheet they’ve been using up to now. you’re in the army now New Army needs to optimize their website New Army is an online clothing retailer that just ran an experiment to test web layouts. For one month, everyone who came to the website was randomly served one of these three home page designs. New Army Here’s what the Men’s client wants Women’s Children’s Pets Men’s Women’s Children’s Pets They had their experiment designers put together a series of tests that promise to answer a lot of their questions about their website design. What they want to do is find the best stylesheets to maximize sales and get people returning to their website. 112 Chapter 4 Home Page 3 New Army Home Page 2data visualization The results are in, but the information designer is out Now that they have a store of fantastic data from a controlled, randomized experiment, they need We got a lot of crap back from the a way to visualize it all together. information designer we hired. It didn’t help us understand our data at all, so he got the So they hired a fancy information designer ax. Can you create data visualizations for us and asked him to pull together something that that help us build a better website? helped them understand the implications of their research. Unfortunately, all did not work out as planned. What we want to see is which stylesheet or stylesheets maximize revenue, the time our visitors spend on the site, and return visits to the site. You’ll need to redesign the visualizations for the analysis. It could be hard work, because the experiment designers at New Army are an exacting bunch and generated a lot of solid data. But before we start, let’s take a look at the rejected designs. We’ll likely learn something by knowing what sort of visualizations won’t Let’s take a look at the work. rejected designs… you are here 4 113 Web guru from New ArmyThe size of the text must have something to do with the number of clicks. It seems that they’re all about the same. You can make tag clouds like this for free at http://www.wordle.net. dubious designs The last information designer submitted these three infographics The information designer submitted these three designs to New Army. Take a look at these designs. What are your impressions? Can you see why the client might not have been pleased? New Army favorite keyword clicks 40 Total page hits by stylesheet 0 Home Page 1 Home Page 2 Home Page 3 114 Chapter 4 Keyword clicks… what does that mean? Looks like this chart measures how many visits each home page got.t a d a t a d a t a d a t a d a t a d a t a d a t a d a t a d a What do those arrows mean? These visualizations are definitely flashy, but what’s behind them? data visualization Typical paths through the New Army website Men’s Women’s Home Pet’s About What data is behind the visualizations? “What is the data behind the visualizations?” is the very first question you should ask when looking at a new visualization. You care about the quality of the data and its interpretation, and you’d hate for a flashy design to get in the way of your own judgments about the analysis. Fancy visualization What sort of data do you think is behind these visualizations? you are here 4 115 t The data is what it’s all about. OK, lots of arrows on this one. What d’ya got back there? a d a t a d a t a d a t a d a t a d a t a d a t a d a t a d aHere are some of New Army’s data sheets. These graphics can fit a lot of different data. let’s see that data New Army Show the data favorite keyword clicks You can’t tell from these visualizations Typical paths through the New Army website what data is behind them. If you’re the client, how could you ever expect to be 40 Men’s able to make useful judgments with the Total page hits by stylesheet visualizations if they don’t even say clearly Women’s what data they describe? Home Show the data. Your first job in creating Pet’s good data visualizations is to facilitate rigorous thinking and good decision About making on the part of your clients, and 0 good data analysis begins and ends with Home Page 1 Home Page 2 Home Page 3 thinking with data. New Army’s actual data, however, is really rich and has all sorts of great material for your visualizations. 116 Chapter 4 And these graphs are not solutions to the problems of New Army. This is what it’s all about. You just don’t know what’s behind them until the designer tells you.Well that’s “nice” of him to say. data visualization Here’s some unsolicited advice from the last designer You didn’t ask for it, but it appears that you’re getting it anyway: the outgoing information designer wants to put in his two cents about the project. Maybe his perspective help… Dan seems to think that an excess of data is a real problem for someone trying to design good data visualizations. Do you think that what he is saying is plausible? Why or why not? you are here 4 117 From the looks of the table on the facing page, it appears that Dan is correct. To: Head First From: Dan’s Dizzying Data Designs Re: Website design optimization project Dear Head First, I want to wish you the best of luck on the New Army project. I didn’t really want to do it anyway, so it’s good for someone else to get a chance to give it a shot. One word of warning: they have a lot of data. Too much, in fact. Once you really dig into it, you’ll know what I mean. I say, give me a nice little tabular layout, and I’ll make you a pretty chart with it. But these guys? They have more data than they know what to do with. And they will expect you to make visuals of all of it for them. I just made a few nice charts, which I understand not everyone liked, but I’ll tell you they’ve set forward an insurmountable task. They want to see it all, but there is just too much. Dan Too much data to visualize it all, huh?the more data the better Is Dan being reasonable when he says it’s too hard to do good visualizations when there is too much data? This isn’t very plausible. The whole point of data analysis is to summarize data, and summarizing tools, like taking the average of a number, will work regardless of whether you have just a few data points or millions. And if you have a bunch of different data sets to compare to each other, really great. visualizations facilitate this sort of data analysis just like all the other tools. Too much data is never your problem It’s easy to get scared by looking at a lot of data. So… much… data But knowing how to deal with what seems like a lot of data is easy, too. Some of this stuff is going to be useful to you. If you’ve got a lot of data and aren’t sure what to do with it, just remember your analytical And some of it won’t objectives. With these in mind, stay focused on be useful to you. the data that speaks to your objectives and ignore the rest. 118 Chapter 4What do you think the client’s looking for? data visualization Duh. The problem is not too much data; the problem is figuring out how to make the data visually appealing. Oh, really? Do you think it’s your job as a data analyst to create an aesthetic experience for your clients? Making the data pretty isn’t your problem either If the data visualization solves a client’s problem, it’s always attractive, whether it’s something really elaborate and visually Excitement stimulating or whether it’s just a plain ol’ Insight Pizazz table of numbers. Charm Making good data visualizations is just like Splash Beauty making any sort of good data analysis. You Wow factor just need to know where to start. Eye-appeal Pop So how do you use a big pile of data with a bunch of different variables to evaluate your objectives? Where exactly do you begin? you are here 4 119Here’s Home Page 2 Here’s Home Page 3 Think about the comparisons that fulfill your client’s objectives. compare well Data visualization is all about What we want to see is which making the right comparisons stylesheet or stylesheets maximize revenue, the time our visitors spend on the site, and return visits to the site. To build good visualizations, first identify what are the fundamental comparisons that will address your client’s objectives. Take a look at their most important spreadsheets: While New Army has more data than these three sheets, these sheets have the comparisons that will speak directly to what they want to know. Let’s try out a comparison now… 120 Chapter 4 Here’s Home Page 1This value represents the goal New Army has for the average amount of money each user spends. data visualization Take look at the statistics that describe the results for Home Page 1. Plot dots to represent each of the users on the axes below. Load this Use your spreadsheet’s average formula (AVG) to calculate the average Revenue and TimeOnSite figures for Home Page 1, and draw those numbers as horizontal and vertical lines on the chart. www.headfirstlabs.com/books/hfda/ hfda_ch04_home_page1.csv Home Page 1 0 10 20 30 40 TimeOnSite How do the results you see compare to their goals for revenue and time on site? you are here 4 121 This value represents the New Army’s goals for the average number of minutes each user spends on the website. Revenue 0 20 40 60 80 100Here’s the average revenue. your scatterplot How did you visualize the Revenue and TimeOnSite variables for Home Page 1? Home Page 1 0 10 20 30 40 TimeOnSite How do the results you see compare to their goals for revenue and time on site? On average, the time people spend looking at the website under Home Page 1 is greater than New Army’s goal for that statistic. On the other hand, the average amount of revenue for each user is less than their goal. 122 Chapter 4 Here’s the average amount of time spent on the website. Revenue 0 20 40 60 80 100Summary Here’s another feature of great visualizations. Summary Variable data visualization Your visualization is already more useful than the rejected ones Now that’s a nice chart, and it’ll definitely be useful to your client. It’s an example of a good data visualization because it… ƒ Shows the data ƒ Makes a smart comparison ƒ Shows multiple variables New Army Home Page 1 favorite keyword clicks Typical paths through the New Army website 40 Men’s Total page hits by stylesheet Women’s Home Pet’s About 0 Home Page 1 Home Page 2 Home Page 3 0 10 20 30 40 TimeOnSite So what kind of chart is that? And what can you actually do with it? you are here 4 123 Data point Variable These charts are just a mess. Revenue 0 20 40 60 80 100scatterplots for causes Use scatterplots to explore causes Scatterplots are great tools for exploratory data analysis, which is the term statisticians use to describe looking around in a set of data for hypotheses to test. Analysts like to use scatterplots when searching for causal relationships, where one variable is affecting the other. As a general rule, the horizontal x-axis of the scatterplot represents the independent variable (the variable we imagine to be a cause), and the vertical y-axis of a scatterplot represents the dependent variable (which we imagine to be the effect). Dependent Variable Independent Variable You don’t have to prove that the value of the independent variable causes the value of the dependent variable, That’s cool, but there is a lot more because after all we’re exploring the data than two variables, and a lot data. But causes are what you’re more comparisons to be made. Can looking for. we plot more variables than just two? 124 Chapter 4 Each of these dots represents an observation, in this case a user on the website. Here’s a scatterplot. It’s a good idea to use little circles for Here’s the cause. your scatterplots, because they’re easier to see when they overlap than dots. Here’s the effect.There’s a lot of opportunity for comparisons here data visualization The best visualizations are highly multivariate A visualization is multivariate if it compares three or more variables. And because making good comparisons is fundamental to data analysis, making your visualizations as multivariate as possible makes it most likely that you’ll make the best comparisons. And in this case you’ve got a bunch of variables. How would you make the scatterplot visualization you’ve created more multivariate? you are here 4 125 You have multiple variables.make it multivariate Show more variables by looking at charts together One way of making your visualization more multivariate is just to show a bunch of similar scatterplots right next to each other, and here’s an example of such a visualization. All of your variables are plotted together in this format, which enables you to compare a huge array of information right in one place. Because New Army is really interested in revenue comparisons, we can just stick with the charts that compare TimeOnSite, Pageviews, and ReturnVisits to revenue. Home Page 1 Home Page 1 Home Page 1 0 10 20 30 40 0 20 40 60 80 0 5 10 20 30 TimeOnSite Pageviews ReturnVisits Home Page 2 Home Page 2 Home Page 2 0 10 20 30 40 0 20 40 60 80 0 5 10 20 30 TimeOnSite Pageviews ReturnVisits Home Page 3 Home Page 3 Home Page 3 0 10 20 30 40 0 20 40 60 80 0 5 10 20 30 TimeOnSite Pageviews ReturnVisits 126 Chapter 4 This graphic was created with a open source software program called R, which you’ll learn more about later. The dotted lines The solid lines are represent New the averages for Army’s goals. that home page. Here’s the chart that you created. Revenue Revenue Revenue 0 40 80 0 40 80 0 40 80 Revenue Revenue Revenue 0 40 80 0 40 80 0 40 80 Revenue Revenue Revenue 0 40 80 0 40 80 0 40 80data visualization You’ve just created a pretty complex visualization. Look at it and think about what it tells you about the stylesheets that New Army decided to test. Do you think that this visualization does a good job of showing the data? Why or why not? Just looking at the dots, you can see that Home Page 2 has a very different sort of spread from the other two stylesheets. What do you think is happening with Home Page 2? Which of the three stylesheets do you think does the best job of maximizing the variables that New Army cares about? Why? you are here 4 127analyze the visualization Does the new visualization help you understand the comparative performance of the stylesheets? Do you think that this visualization does a good job of showing the data? Why or why not? Definitely. Each dot on each of the nine panels represents the experience of a single user, so even though the data points are summarized into averages, you can still see absolutely all of them. Seeing all the points makes it easy to evaluate the spread, and the average lines make it easy to see how each stylesheet performs relative to each other and relative to New Army’s goals. Just looking at the dots, you can see that Home Page 2 has a very different sort of spread from the other two stylesheets. What do you think is happening with Home Page 2? It looks like Home Page 2 is performing terribly. Compared to the other two stylesheets, Home Page 2 isn’t bringing in much revenue and also performs poorly on the Time on Site, Pageviews, and Return Visits figures. Every single user statistic is below New Army’s goals. Home Page 2 is terrible and should be taken offline immediately Which of the three stylesheets do you think does the best job of maximizing the variables that New Army cares about? Why? Home Page 3 is the best. While 1 performs above average when it comes to the metrics besides Revenue, 3 is way ahead in terms of revenue. When it comes to Return Visits, 1 is ahead, and they’re neck-and-neck on Pageviews, but people spend more time on the site with 3. It’s great that 1 gets a lot of return visits, but you can’t argue with 3’s superior revenue. 128 Chapter 4data visualization Q: What software tool should I use to create this sort of A: No, no, no If you want inspiration on designs, you should graphic? probably pick up some books by Edward Tufte, who’s the authority on data visualization by a long shot. His body of work is like a museum of excellent data visualizations, which he sometimes calls “cognitive A: Those specific graphs are created in a statistical data analysis art.” program called R, which you’re going to learn all about later in the book. But there are a number of charting tools you can use in Q: statistical programs, and you don’t even have to stop there. You What about magazine, newspapers, and journal articles? can use illustration programs like Adobe Illustrator and just draw visualizations, if you have visual ideas that other software tools don’t A: It’s a good idea to become sensitive to data visualization implement. quality in publications. Some are better than others when it comes to designing illuminating visualizations, and when you pay attention Q: What about Excel and OpenOffice? They have charting to the publications, over time, you’ll get a sense of which ones do a tools, too. better job. A good way to start would be to count the variables in a graphic. If there are three or more variables in a chart, the publication is more likely to be making intelligent comparisons than if there’s one A: Yes, well, that’s true. They have a limited range of charting variable to a chart. tools you can use, and you can probably figure out a way to create a chart like this one in your spreadsheet program, but it’s going to be Q: an uphill battle. What should I make of data visualizations that are complex and artistic but not analytically useful? Q: You don’t sound too hot on spreadsheet data visualizations. A: There’s a lot of enthusiasm and creativity nowadays for creating new computer-generated visualizations. Some of them facilitate good analytical thinking about the data, and some of them A: Many serious data analysts who use spreadsheets all the time are just interesting to look at. There’s absolutely nothing wrong for basic calculations and lists nevertheless wouldn’t dream of using with what some call data art. Just don’t call it data analysis unless spreadsheet charting tools. They can be a real pain: not only is there you can directly use it to achieve a greater understanding of the a small range of charts you can create in spreadsheet programs, but underlying data. often, the programs force you into formatting decisions that you might not otherwise make. It’s not that you can’t make good data graphics Q: in spreadsheet programs; it’s just that there’s more trouble in it than So something can be visually interesting without being you’d have if you learned how to use a program like R. analytically illuminating. What about vice versa? Q: So if I’m looking for inspiration on chart types, the A: That’s your judgement call. But if you have something at stake spreadsheet menus aren’t the place to look? in an analysis, and your visualization is illuminating, then it’s hard to imagine that the graphic wouldn’t be visually interesting Let’s see what the client thinks… you are here 4 129Nice Here’s a reasonable question. communicate with your client The visualization is great, but the web guru’s not satisfied yet You just got an email from your client, the web guru at New Army, assessing what you created for him. Let’s see what he has to say… He wants to know about causality. Knowing what designs work only takes him so far. In order to make his website as powerful as possible, he needs some idea of why people interact with the different home pages the way they do. And, since he’s the client, we definitely need to address the theories he put forward. 130 Chapter 4 To: Head First From: New Army Web Guru Re: My explanation of the data Your designs are excellent and we’re pleased we switched to you from the other guy. But tell me something: why does Home Page 3 perform so much better than the others? All this looks really reasonable, but I still want to know why we have these results. I’ve got two pet theories. First, I think that Home Page 3 loads faster, which makes the experience of the website more snappy. Second, I think that its cooler color palette is really relaxing and makes for a good shopping experience. What do you think? Looks like your client has some ideas of his own about why the data looks the way it looks. He’s short and sweet. What can you do with his request?