How does visualization work

best visualisation techniques and does visualization really work and how to do creative visualization
JohenCorner Profile Pic
Published Date:02-08-2017
Your Website URL(Optional)
Classifications of Visualizations There are several ways to categorize and think about different kinds of visualizations. Here are four of the most useful. The first two are unrelated to the others; the last two are related to each other. Complexity One way to classify a data visualization is by counting how many different data dimen- sions it represents. By this we mean the number of discrete types of information that are visually encoded in a diagram. For example, a simple line graph may show the price of a company’s stock on different days: that’s two data dimensions. If multiple companies are shown (and therefore compared), there are now three dimensions; if trading volume per day is added to the graph, there are four (Figure 1-1). This count of the number of data dimensions can be described as the level of complex- ity of the visualization. As visualizations become more complex, they are more chal- lenging to design well, and can be more difficult to learn from. For that reason, visu- alizations with no more than three or four dimensions of data are the most common— though visualizations with six, seven, or more dimensions can be found. Adding more volume or data points of the same data dimension doesn’t increase complexity. Showing 100 years of stock data for one stock isn’t more complex than one week of data, it’s just more voluminous. Show- ing 50 companies instead of two might make the display more crowded or complicated, but fundamentally it’s just more data points in the company dimension, and therefore isn’t making the graph more complex. There are two main challenges to designing more complex visualizations. The first is that the more dimensions you need to encode visually, the more individual visual properties you need to use. Selecting properties is easy to do for the first few dimensions when most visual properties haven’t been used. However, as more dimensions are 3 Figure 1-1. Four data dimensions are shown in this graph. Adding more points within any of these dimensions won’t change the graph’s complexity. added, finding appropriate, unused visual properties becomes more difficult. (Bear in mind that a visualization shows not just types of information but also the relation- ships between and among those information types.) As this difficulty in design increa- ses, intentionality in the decision-making process becomes ever more necessary. The way to succeed in the face of this challenge is to be intentional about which property to use for each dimension, and iterate or change encodings as the design evolves. This is the subject of Part II. The second challenge for designing more complex visualizations is that there are rela- tively few well-known conventions, metaphors, defaults, and best practices to rely on. Because the safety net of convention may not exist, there is more of a burden on the designer to make good choices that can be easily understood by the reader. Infographics versus Data Visualization You may have heard the terms infographics and data visualization used in different ways, or interchangeably in different contexts, or even casually by the same person in a single sentence. You may also have heard these terms used politically—that is, with positive or negative connotations attached. Some people use infographic to refer to representations of information perceived as casual, funny, or frivolous, and visualiza- tion to refer to designs perceived to be more serious, rigorous, or academic. 4 Chapter 1: Classifications of Visualizations The truth is, even though the art of representing statistical information visually is hun- dreds of years old, the vocabulary of the field is still evolving and settling. Among the general public, there is still confusion over what these two terms mean, but within the information design community, definitions for these terms are solidifying. In short: The distinction between infographics and data visualizations (or information visualizations) is based on both form and origin (see Figure 1-2). Figure 1-2. The difference between infographics and data visualization may be loosely determined by the method of generation, the quantity of data represented, and the degree of aesthetic treatment applied. Infographics We suggest that the term infographics is useful for referring to any visual representation of data that is: • manually drawn (and therefore a custom treatment of the information); • specific to the data at hand (and therefore nontrivial to recreate with different data); • aesthetically rich (strong visual content meant to draw the eye and hold interest); and Infographics versus Data Visualization 5 • relatively data-poor (because each piece of information must be manually encoded). Put another way, infographics are illustrations where the data representation is man- ually laid out or sketched, probably with drawing software such as Adobe Illustrator. Because of their manually-drawn process of creation, infographics have the option of being aesthetically rich (see Figure 1-3 ). Another consequence of their manual origins is they tend to be limited in the amount of data they can convey, simply due to the practical limitations of manipulating many data points. Similarly, it is difficult to change or update the data in an infographic, as any changes must be implemented manually. Figure 1-3. Flint Hahn’s Burning Man infographic is a great example of an aesthetically rich, manually-drawn piece. This is not a complete, universal, or absolute definition, but may be a helpful way to think about and identify the category. Flint Hahn (2010). Copyright © 2010, Flint Hahn. Permission to reproduce the likeness of Burning Man and the mark “Burning Man” granted by Burning Man. 6 Chapter 1: Classifications of VisualizationsData Visualization By contrast, we suggest that the terms data visualization and information visualiza- tion (casually, data viz and info viz) are useful for referring to any visual representation of data that is: • algorithmically drawn (may have custom touches but is largely rendered with the help of computerized methods); • easy to regenerate with different data (the same form may be repurposed to rep- resent different datasets with similar dimensions or characteristics); • often aesthetically barren (data is not decorated); and • relatively data-rich (large volumes of data are welcome and viable, in contrast to infographics). Data visualizations are initially designed by a human, but are then drawn algorithmi- cally with graphing, charting, or diagramming software. The advantage of this approach is that it is relatively simple to update or regenerate the visualization with more or new data. While they may show great volumes of data, information visualizations are often less aesthetically rich than infographics. As you will have inferred from the title of this book, it is this latter cat- egory of data visualizations with which we are primarily concerned here. However, the principles we present are relevant to the design of both infographics and data visualizations. Exploration versus Explanation Generally speaking, there are two categories of data visualization: exploration and explanation. The two serve different purposes, and so there are tools and approaches that may be appropriate only for one and not the other. For this reason, it is important to understand the distinction, so that you can be sure you are using tools and ap- proaches appropriate to the task at hand. Exploration Exploratory data visualizations are appropriate when you have a whole bunch of data and you’re not sure what’s in it. When you need to get a sense of what’s inside your data set, translating it into a visual medium can help you quickly identify its features, including interesting curves, lines, trends, or anomalous outliers. Exploration is generally best done at a high level of granularity. There may be a whole lot of noise in your data, but if you oversimplify or strip out too much information, you could end up missing something important. This type of visualization is typically part of the data analysis phase, and is used to find the story the data has to tell you. Exploration versus Explanation 7Explanation By contrast, explanatory data visualization is appropriate when you already know what the data has to say, and you are trying to tell that story to somebody else. It could be the head of your department, a grant committee, or the general public. Whoever your audience is, the story you are trying to tell (or the answer you are trying to share) is known to you at the outset, and therefore you can design to specifically accommodate and highlight that story. In other words, you’ll need to make certain editorial decisions about which information stays in, and which is distracting or irrele- vant and should come out. This is a process of selecting focused data that will support the story you are trying to tell. If exploratory data visualization is part of the data analysis phase, then explanatory data visualization is part of the presentation phase. Such a visualization may stand on its own, or may be part of a larger presentation, such as a speech, a newspaper article, or a report. In these scenarios, there is some supporting narrative—written or verbal— that further explains things. Hybrids: Exploratory Explanation It’s worth noting that there is also a kind of hybrid category, which involves a curated dataset that is nonetheless presented with the intention to allow some exploration on the reader’s part. These visualizations are usually interactive via some kind of graphical interface that lets the reader choose and constrain certain parameters, thereby discov- ering for herself whatever insights the dataset may have to offer. These might even be insights the creator of the visualization hasn’t come across yet. So in these hybrid designs there is a certain freedom-of-discovery aspect to the infor- mation presented, but it is usually not totally raw; it has been distilled and facilitated to some extent. See for an example. Informative versus Persuasive versus Visual Art We posit that there are three main categories of explanatory visualizations based on the relationships between the three necessary players: the designer, the reader, and the data. This section refers to explanatory (or hybrid) visualizations exclusively, because it dis- cusses designing visualizations of data with known parameters and stories. If you don’t yet know the message you intend to convey, then you’re still in an exploration phase, and probably aren’t designing for the same styles of consumption as this section describes. 8 Chapter 1: Classifications of VisualizationsThe Designer-Reader-Data Trinity It is useful to think of an effective explanatory data visualization as being supported by a three-legged stool consisting of the designer, the reader, and the data. Each of these “legs” exerts a force, or contributes a separate perspective, that must be taken into consideration for a visualization to be stable and successful. Chapter 2 will address the considerations of each of the three in much more detail, but we find it helpful to in- troduce the concept here. Each of the three legs of the stool has a unique relationship to the other two. While it is necessary to account for the needs and perspective of all three in each visualization project, the dominant relationship will ultimately determine which category of visual- ization is needed (see Figure 1-4). Figure 1-4. The nature of the visualization depends on which relationship (between two of the three components) is dominant. Informative An informative visualization primarily serves the relationship between the reader and the data. It aims for a neutral presentation of the facts in such a way that will educate the reader (though not necessarily persuade him). Informative visualizations are often associated with broad data sets, and seek to distill the content into a manageably con- sumable form. Ideally, they form the bulk of visualizations that the average person encounters on a day-to-day basis—whether that’s at work, in the newspaper, or on a service-provider’s website. The Burning Man Infographic (Figure 1-2) is an example of informative visualization. Informative versus Persuasive versus Visual Art 9Persuasive A persuasive visualization primarily serves the relationship between the designer and the reader. It is useful when the designer wishes to change the reader’s mind about some- thing. It represents a very specific point of view, and advocates a change of opinion or action on the part of the reader. In this category of visualization, the data represented is specifically chosen for the purpose of supporting the designer’s point of view, and is presented carefully so as to convince the reader of same. See also: propaganda. While an informative visualization may not have an intentional point of view in the manner that a persuasive visualization does, all visualizations are going to be biased to some degree, based on the fact that designers are human and have to make choices. A good example of persuasive visualization is the Joint Economic Committee minority’s rendition of the proposed Democratic health care plan in 2010, shown in Figure 4-14. Visual Art The third category, visual art, primarily serves the relationship between the designer and the data. Visual art is unlike the previous two categories in that it often entails unidir- ectional encoding of information, meaning that the reader may not be able to decode the visual presentation to understand the underlying information. Whereas both informative and persuasive visualizations are meant to be easily decod- able—bidirectional in their encoding—visual art merely translates the data into a visual form. The designer may intend only to condense it, translate it into a new medium, or make it beautiful; she may not intend for the reader to be able to extract anything from it other than enjoyment. This category of visualization is sometimes more easily recognized than others. For example, Nora Ligorano and Marshall Reese designed a project that converts Twitter † streams into a woven fiber-optic tapestry (Figure 1-5 ; tic-tapestry). A project like this is abstract enough that most people intuitively recognize it as art: something to be appreciated rather than explicitly decoded. But a project like the Planetary app from Bloom Studios ( is less easily categorized. Ostensibly, one may decode the information represented visu- ally by noting the number of stars (representing artists), planets (representing albums), and moons (representing tracks) in a constellation or galaxy on the screen. But prop- erties such as track length, encoded as the speed at which the each moon orbits its album-planet, are encoded too subtly for the average user to decode—at which point, † Nora Ligorano and Marshall Reese (2011). Copyright © 2011, Ligorano/Reese. -optic-tapestry 10 Chapter 1: Classifications of Visualizations it just becomes something pretty to look at. A worthy pursuit in its own right, perhaps, but better clearly labeled as visual art, and not confused with informative visualization. Figure 1-5. Participants address the Fiber Optic Tapestry by tweeting optictapestry and a primary color—the tapestry displays the colors in algorithmically-determined patterns. Informative versus Persuasive versus Visual Art 11Source Trinity: Ingredients of Successful Visualizations Let’s look a little more closely at the three major sources of influence on the design of your data visualization. It’s important to be clear that this applies mainly to explana- tory visualization. While exploratory visualization is more about you finding out what’s in your data, explanatory visualization is about you as a designer telling the story of the data to your reader. These three components are your holy trinity when designing data visualizations. Designer As a designer, you have a goal. You may not be aware of it, but you are creating a visualization for some reason. Being aware of your motivations, goals, and priorities will help you design a successful visualization, rather than merely create an arbitrary visual representation of your data. Why Are You Here? Understanding and defining your goal is key to your success; it is the foundation of your process. Having a well-defined goal will inform your subsequent design decisions, and will give you a standard to evaluate your design against. And it will help you make appropriate choices long before you start assigning axes and plotting points. As discussed in Chapter 1, there are different types of visualizations. Knowing which type of visualization you’re working with is an excellent first step in your design process. You need to know whether you have a specific story to tell with your data (explanation), or you are visualizing it to begin to see what’s there (exploration). If you have a story to tell, your visualization is almost certainly informative, persuasive, or visual art. Once you know what type of visualization you’re creating, you can begin thinking about what kind of experience that visualization type should provide to your reader (even if 13 that’s you). What information should they be able to learn from this visualization? What point or message are you trying to convey? (Of course, if your piece is primarily meant to be an artistic work, your goal may be all about aesthetics, not information, and we would not presume to tell you how to make art.) Keep your goal in mind. It is your touchstone, your guiding light. Consult it when you are about to be seduced by the siren song of circular layouts, the allure of extra data, the false prophet of “because I can.” These are distractions on your journey. As Bruce Lee would say, “It is like a finger pointing a way to the moon. Don’t concentrate on the finger or you will miss all that heavenly glory.” To be clear, you should be open to iteration and evolution, serendipity, and the paths that new insight may reveal. But never lose sight of why you’re here in the first place. Your unique perspective is the value you bring to the table, and it should inform your design choices. Reader The second source of influence is the reader. As the intended recipient of your ideas, the reader holds a very special place in the trinity and can be your biggest ally or your biggest hurdle in clear communication—sometimes both. You Are Creating This for Other People At all stages of creating your visualization, it is important to put yourself in the shoes of your reader, and to take into consideration the unique viewpoint that he will bring with him. Why? Your success is measured by your reader’s success. Remember, explanatory data visualization is a communication medium. You are se- lectively encoding specific information in such a way that the reader will be able to decode it and successfully receive your message. If that message is misinterpreted or poorly received, then you have not done a very good job of encoding it, have you? In order to be successful, you need to consider the various “distortions” or filters your readers will introduce. Another benefit to putting yourself in the reader’s shoes is that it will force you to simplify your explanations a bit. This is not the same as simplifying your ideas. It’s merely a process of breaking down those ideas until you can communicate them in clear and transparent terms. If you find that you can’t explain your data or your thinking in a straightforward man- ner, it might be because you yourself don’t understand them well enough, or haven’t 14 Chapter 2: Source Trinity: Ingredients of Successful Visualizationsthought enough about a logical way to present them. This process can be a learning opportunity for you, too, and may ultimately strengthen your research. They Are Not You At the risk of stating the obvious, it’s important to note that your audience is not you. That is to say: if we agree that the purpose of the visualization is to take a story that is already known to you and tell it to somebody else, then it stands to reason that the somebody else is exactly that—not you, but an other. Your grandmother, your boss, your niece, your neighbor—all these people bring dif- ferent contexts to the table. And that doesn’t even begin to cover questions like, “What is the reader’s political identity and how do I characterize the borders between coun- tries?” Your audience is not like you. Even if you are creating a visualization for your team at work or for others in your own demographic group, you have the “curse” of too much knowledge, which lets you make too many assumptions. For this reason, you must learn how to take yourself out of the picture when assessing † how your message will be received. We acknowledge: this is difficult to do (But the payoff will be worth it in the end.) Contextual Considerations for the Reader Think of throwing a plastic disc with a friend in the park. If you want to be successful, you won’t just drop the disk on the ground in the same spot where you are standing: you will exert some physical effort to toss it to where your friend can reach it. Further, if you are an experienced disc-tosser, you will, consciously or not, take into account considerations like how long your friend’s arms are, whether she is already in motion and, if so, in which direction she is moving, and how fast. Effective communication is just like that. Your own position matters, but it is not the same—and does not matter nearly as much—as the position of your receiver, whether they are receiving a plastic disc or a dataset. The considerations you’ll need to make when designing a data visualization, then, are questions of identity, motivation, and language (i.e., specialized knowledge and vo- cabulary, such as professional jargon). Another thing to consider is learned social context. This encompasses questions such as: • What do colors mean? • Which direction it the reader used to reading in? • Which icons is she familiar with? † The ability to do this will help you in lots of other areas of your life, too. Reader 15 We’ll take a closer look at how certain reader contexts affect encoding choices for attributes such as titles, tags, and labels; colors; and directional orientation in “Readers’ Context” on page 31. Context of Use To extend the plastic disc metaphor, a successful toss takes into consideration not only the attributes your friend possesses (arm-length, motion, speed), but also the attributes of the context surrounding your friend: things like whether the wind is blowing, whether the sun is in her eyes, and whether the terrain is even. Similarly, a successful data visualization will take into account different time-frames the reader may be constrained to, the factors motivating him to understand your data, and the information he needs to gather to meet his own goals or make good decisions. The key questions to ask here are ones like: • What information does my reader need to be successful? • How much detail does she need? • How long does she have to make it effective? Once you understand the context in which the reader is operating, you can discern which information he needs; and once you understand the filters he may be using, you can discern how best to present that information to him. Data The third source of influence in designing a visualization is your data. The best visu- alizations will reveal what is interesting about the specific data set you’re working with. Different data may require different approaches, encodings, or techniques to reveal its interesting aspects. While default visualization formats are a great place to start, and may come with the correct design choices pre-selected, sometimes the data will yield new knowledge when a different visualization approach or format is used. How do you choose a visualization format that shows your data’s best (and most in- teresting or informative) side? Know your data. Respect your data. Instead of shoe- horning it into a format that seems slick but doesn’t really work, consider the inherent values, relationships, and structures of your data. The type of basic questions you will want to ask about your data include: • Is it a time-series? A hierarchy? • How many dimensions does it have? Which are the most important ones? • What sort of relationships do they have (e.g., one-to-one or many-to-many)? • How variable are they? 16 Chapter 2: Source Trinity: Ingredients of Successful Visualizations • Are the values categorical? Discrete or continuous? Linear or non-linear? How are they bounded? • How many categories are there? If this sounds a little bit like a spec for a database table, it’s with good reason. You must understand what you’re dealing with in order to treat it well. Data 17