Computer vision algorithms and applications download 2018

computer vision algorithms and applications szeliski pdf and computer vision algorithms used in sixth sense technology
JuliyaMadenta Profile Pic
JuliyaMadenta,Philippines,Researcher
Published Date:15-07-2017
Your Website URL(Optional)
Comment
Computer Vision: Algorithms and Applications Richard Szeliski September 3, 2010 draft c 2010 Springer This electronic draft is for non-commercial personal use only, and may not be posted or re-distributed in any form. Please refer interested readers to the book’s Web site at http://szeliski.org/Book/.Chapter 1 Introduction 1.1 What is computer vision? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 A brief history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3 Book overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Sample syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5 A note on notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.6 Additional reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Figure 1.1 The human visual system has no problem interpreting the subtle variations in translucency and shading in this photograph and correctly segmenting the object from its background.2 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) Figure 1.2 Some examples of computer vision algorithms and applications. (a) Structure from motion algorithms can reconstruct a sparse 3D point model of a large complex scene from hundreds of partially overlapping photographs (Snavely, Seitz, and Szeliski 2006) c 2006 ACM. (b) Stereo matching algorithms can build a detailed 3D model of a building fac ¸ade from hundreds of differently exposed photographs taken from the Internet (Goesele, Snavely, Curless et al. 2007) c 2007 IEEE. (c) Person tracking algorithms can track a person walking in front of a cluttered background (Sidenbladh, Black, and Fleet 2000) c 2000 Springer. (d) Face detection algorithms, coupled with color-based clothing and hair detection algorithms, c can locate and recognize the individuals in this image (Sivic, Zitnick, and Szeliski 2006) 2006 Springer.1.1 What is computer vision? 3 1.1 What is computer vision? As humans, we perceive the three-dimensional structure of the world around us with apparent ease. Think of how vivid the three-dimensional percept is when you look at a vase of flowers sitting on the table next to you. You can tell the shape and translucency of each petal through the subtle patterns of light and shading that play across its surface and effortlessly segment each flower from the background of the scene (Figure 1.1). Looking at a framed group por- trait, you can easily count (and name) all of the people in the picture and even guess at their emotions from their facial appearance. Perceptual psychologists have spent decades trying to 1 understand how the visual system works and, even though they can devise optical illusions to tease apart some of its principles (Figure 1.3), a complete solution to this puzzle remains elusive (Marr 1982; Palmer 1999; Livingstone 2008). Researchers in computer vision have been developing, in parallel, mathematical tech- niques for recovering the three-dimensional shape and appearance of objects in imagery. We now have reliable techniques for accurately computing a partial 3D model of an environment from thousands of partially overlapping photographs (Figure 1.2a). Given a large enough set of views of a particular object or fac ¸ade, we can create accurate dense 3D surface mod- els using stereo matching (Figure 1.2b). We can track a person moving against a complex background (Figure 1.2c). We can even, with moderate success, attempt to find and name all of the people in a photograph using a combination of face, clothing, and hair detection and recognition (Figure 1.2d). However, despite all of these advances, the dream of having a computer interpret an image at the same level as a two-year old (for example, counting all of the animals in a picture) remains elusive. Why is vision so difficult? In part, it is because vision is an inverse problem, in which we seek to recover some unknowns given insufficient information to fully specify the solution. We must therefore resort to physics-based and prob- abilistic models to disambiguate between potential solutions. However, modeling the visual world in all of its rich complexity is far more difficult than, say, modeling the vocal tract that produces spoken sounds. The forward models that we use in computer vision are usually developed in physics (ra- diometry, optics, and sensor design) and in computer graphics. Both of these fields model how objects move and animate, how light reflects off their surfaces, is scattered by the at- mosphere, refracted through camera lenses (or human eyes), and finally projected onto a flat (or curved) image plane. While computer graphics are not yet perfect (no fully computer- 2 animated movie with human characters has yet succeeded at crossing the uncanny valley that separates real humans from android robots and computer-animated humans), in limited 1 http://www.michaelbach.de/ot/sze muelue 2 The term uncanny valley was originally coined by roboticist Masahiro Mori as applied to robotics (Mori 1970). It is also commonly applied to computer-animated films such as Final Fantasy and Polar Express (Geller 2008).4 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) XXXXXXX O X O X O XX XXXX X XX X OX XX O X XXXXXXX O XX O XX O XXXXX XXXX O X OO X X X XXXXX O XX OX XX XXXXXXX X O XXX O X XXXX X XX O X XO XX O XXXXXXX X O XXX O X XXXXXXX XXX OO XX XXXXXXX X O XXX O X (c) (d) Figure 1.3 Some common optical illusions and what they might tell us about the visual sys- tem: (a) The classic Muller ¨ -Lyer illusion, where the length of the two horizontal lines appear different, probably due to the imagined perspective effects. (b) The “white” square B in the shadow and the “black” square A in the light actually have the same absolute intensity value. The percept is due to brightness constancy, the visual system’s attempt to discount illumi- nation when interpreting colors. Image courtesy of Ted Adelson, http://web.mit.edu/persci/ people/adelson/checkershadow illusion.html. (c) A variation of the Hermann grid illusion,  courtesy of Hany Farid, http://www.cs.dartmouth.edu/ farid/illusions/hermann.html. As you move your eyes over the figure, gray spots appear at the intersections. (d) Count the red Xs in the left half of the figure. Now count them in the right half. Is it significantly harder? The explanation has to do with a pop-out effect (Treisman 1985), which tells us about the operations of parallel perception and integration pathways in the brain.1.1 What is computer vision? 5 domains, such as rendering a still scene composed of everyday objects or animating extinct creatures such as dinosaurs, the illusion of reality is perfect. In computer vision, we are trying to do the inverse, i.e., to describe the world that we see in one or more images and to reconstruct its properties, such as shape, illumination, and color distributions. It is amazing that humans and animals do this so effortlessly, while computer vision algorithms are so error prone. People who have not worked in the field often under- estimate the difficulty of the problem. (Colleagues at work often ask me for software to find and name all the people in photos, so they can get on with the more “interesting” work.) This misperception that vision should be easy dates back to the early days of artificial intelligence (see Section 1.2), when it was initially believed that the cognitive (logic proving and plan- ning) parts of intelligence were intrinsically more difficult than the perceptual components (Boden 2006). The good news is that computer vision is being used today in a wide variety of real-world applications, which include:  Optical character recognition (OCR): reading handwritten postal codes on letters (Figure 1.4a) and automatic number plate recognition (ANPR);  Machine inspection: rapid parts inspection for quality assurance using stereo vision with specialized illumination to measure tolerances on aircraft wings or auto body parts (Figure 1.4b) or looking for defects in steel castings using X-ray vision;  Retail: object recognition for automated checkout lanes (Figure 1.4c);  3D model building (photogrammetry): fully automated construction of 3D models from aerial photographs used in systems such as Bing Maps;  Medical imaging: registering pre-operative and intra-operative imagery (Figure 1.4d) or performing long-term studies of people’s brain morphology as they age;  Automotive safety: detecting unexpected obstacles such as pedestrians on the street, under conditions where active vision techniques such as radar or lidar do not work well (Figure 1.4e; see also Miller, Campbell, Huttenlocher et al. (2008); Montemerlo, Becker, Bhat et al. (2008); Urmson, Anhalt, Bagnell et al. (2008) for examples of fully automated driving);  Match move: merging computer-generated imagery (CGI) with live action footage by tracking feature points in the source video to estimate the 3D camera motion and shape of the environment. Such techniques are widely used in Hollywood (e.g., in movies such as Jurassic Park) (Roble 1999; Roble and Zafar 2009); they also require the use of precise matting to insert new elements between foreground and background elements (Chuang, Agarwala, Curless et al. 2002).6 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) (e) (f) Figure 1.4 Some industrial applications of computer vision: (a) optical character recognition (OCR) http://yann.lecun.com/exdb/lenet/; (b) mechanical inspection http://www.cognitens. com/; (c) retail http://www.evoretail.com/; (d) medical imaging http://www.clarontech.com/; (e) automotive safety http://www.mobileye.com/; (f) surveillance and traffic monitoring http: //www.honeywellvideo.com/, courtesy of Honeywell International Inc.1.1 What is computer vision? 7  Motion capture (mocap): using retro-reflective markers viewed from multiple cam- eras or other vision-based techniques to capture actors for computer animation;  Surveillance: monitoring for intruders, analyzing highway traffic (Figure 1.4f), and monitoring pools for drowning victims;  Fingerprint recognition and biometrics: for automatic access authentication as well as forensic applications. David Lowe’s Web site of industrial vision applications (http://www.cs.ubc.ca/spider/lowe/ vision.html) lists many other interesting industrial applications of computer vision. While the above applications are all extremely important, they mostly pertain to fairly specialized kinds of imagery and narrow domains. In this book, we focus more on broader consumer-level applications, such as fun things you can do with your own personal photographs and video. These include:  Stitching: turning overlapping photos into a single seamlessly stitched panorama (Fig- ure 1.5a), as described in Chapter 9;  Exposure bracketing: merging multiple exposures taken under challenging lighting conditions (strong sunlight and shadows) into a single perfectly exposed image (Fig- ure 1.5b), as described in Section 10.2;  Morphing: turning a picture of one of your friends into another, using a seamless morph transition (Figure 1.5c);  3D modeling: converting one or more snapshots into a 3D model of the object or person you are photographing (Figure 1.5d), as described in Section 12.6  Video match move and stabilization: inserting 2D pictures or 3D models into your 3 videos by automatically tracking nearby reference points (see Section 7.4.2) or using motion estimates to remove shake from your videos (see Section 8.2.1);  Photo-based walkthroughs: navigating a large collection of photographs, such as the interior of your house, by flying between different photos in 3D (see Sections 13.1.2 and 13.5.5)  Face detection: for improved camera focusing as well as more relevant image search- ing (see Section 14.1.1);  Visual authentication: automatically logging family members onto your home com- puter as they sit down in front of the webcam (see Section 14.2).8 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) Figure 1.5 Some consumer applications of computer vision: (a) image stitching: merging c different views (Szeliski and Shum 1997) 1997 ACM; (b) exposure bracketing: merging different exposures; (c) morphing: blending between two photographs (Gomes, Darsa, Costa et al. 1999) c 1999 Morgan Kaufmann; (d) turning a collection of photographs into a 3D model (Sinha, Steedly, Szeliski et al. 2008) c 2008 ACM.1.1 What is computer vision? 9 The great thing about these applications is that they are already familiar to most students; they are, at least, technologies that students can immediately appreciate and use with their own personal media. Since computer vision is a challenging topic, given the wide range 4 of mathematics being covered and the intrinsically difficult nature of the problems being solved, having fun and relevant problems to work on can be highly motivating and inspiring. The other major reason why this book has a strong focus on applications is that they can be used to formulate and constrain the potentially open-ended problems endemic in vision. For example, if someone comes to me and asks for a good edge detector, my first question is usually to ask why? What kind of problem are they trying to solve and why do they believe that edge detection is an important component? If they are trying to locate faces, I usually point out that most successful face detectors use a combination of skin color detection (Exer- cise 2.8) and simple blob features Section 14.1.1; they do not rely on edge detection. If they are trying to match door and window edges in a building for the purpose of 3D reconstruction, I tell them that edges are a fine idea but it is better to tune the edge detector for long edges (see Sections 3.2.3 and 4.2) and link them together into straight lines with common vanishing points before matching (see Section 4.3). Thus, it is better to think back from the problem at hand to suitable techniques, rather than to grab the first technique that you may have heard of. This kind of working back from problems to solutions is typical of an engineering approach to the study of vision and reflects my own background in the field. First, I come up with a detailed problem definition and decide on the constraints and specifications for the problem. Then, I try to find out which techniques are known to work, implement a few of these, evaluate their performance, and finally make a selection. In order for this process to work, it is important to have realistic test data, both synthetic, which can be used to verify correctness and analyze noise sensitivity, and real-world data typical of the way the system will finally be used. However, this book is not just an engineering text (a source of recipes). It also takes a scientific approach to basic vision problems. Here, I try to come up with the best possible models of the physics of the system at hand: how the scene is created, how light interacts with the scene and atmospheric effects, and how the sensors work, including sources of noise and uncertainty. The task is then to try to invert the acquisition process to come up with the best possible description of the scene. The book often uses a statistical approach to formulating and solving computer vision problems. Where appropriate, probability distributions are used to model the scene and the noisy image acquisition process. The association of prior distributions with unknowns is often 3 For a fun student project on this topic, see the “PhotoBook” project at http://www.cc.gatech.edu/dvfx/videos/ dvfx2005.html. 4 These techniques include physics, Euclidean and projective geometry, statistics, and optimization. They make computer vision a fascinating field to study and a great way to learn techniques widely applicable in other fields.10 Computer Vision: Algorithms and Applications (September 3, 2010 draft) called Bayesian modeling (Appendix B). It is possible to associate a risk or loss function with mis-estimating the answer (Section B.2) and to set up your inference algorithm to minimize the expected risk. (Consider a robot trying to estimate the distance to an obstacle: it is usually safer to underestimate than to overestimate.) With statistical techniques, it often helps to gather lots of training data from which to learn probabilistic models. Finally, statistical approaches enable you to use proven inference techniques to estimate the best answer (or distribution of answers) and to quantify the uncertainty in the resulting estimates. Because so much of computer vision involves the solution of inverse problems or the esti- mation of unknown quantities, my book also has a heavy emphasis on algorithms, especially those that are known to work well in practice. For many vision problems, it is all too easy to come up with a mathematical description of the problem that either does not match realistic real-world conditions or does not lend itself to the stable estimation of the unknowns. What we need are algorithms that are both robust to noise and deviation from our models and rea- sonably efficient in terms of run-time resources and space. In this book, I go into these issues in detail, using Bayesian techniques, where applicable, to ensure robustness, and efficient search, minimization, and linear system solving algorithms to ensure efficiency. Most of the algorithms described in this book are at a high level, being mostly a list of steps that have to be filled in by students or by reading more detailed descriptions elsewhere. In fact, many of the algorithms are sketched out in the exercises. Now that I’ve described the goals of this book and the frameworks that I use, I devote the rest of this chapter to two additional topics. Section 1.2 is a brief synopsis of the history of computer vision. It can easily be skipped by those who want to get to “the meat” of the new material in this book and do not care as much about who invented what when. The second is an overview of the book’s contents, Section 1.3, which is useful reading for everyone who intends to make a study of this topic (or to jump in partway, since it describes chapter inter-dependencies). This outline is also useful for instructors looking to structure one or more courses around this topic, as it provides sample curricula based on the book’s contents. 1.2 A brief history In this section, I provide a brief personal synopsis of the main developments in computer vision over the last 30 years (Figure 1.6); at least, those that I find personally interesting and which appear to have stood the test of time. Readers not interested in the provenance of various ideas and the evolution of this field should skip ahead to the book overview in Section 1.3.1.2 A brief history 11 197 70 1980 1 1990 2000 Figure 1.6 A rough timeline of some of the most active topics of research in computer vision. 1970s. When computer vision first started out in the early 1970s, it was viewed as the visual perception component of an ambitious agenda to mimic human intelligence and to endow robots with intelligent behavior. At the time, it was believed by some of the early pioneers of artificial intelligence and robotics (at places such as MIT, Stanford, and CMU) that solving the “visual input” problem would be an easy step along the path to solving more difficult problems such as higher-level reasoning and planning. According to one well-known story, in 1966, Marvin Minsky at MIT asked his undergraduate student Gerald Jay Sussman to “spend the summer linking a camera to a computer and getting the computer to describe 5 what it saw” (Boden 2006, p. 781). We now know that the problem is slightly more difficult 6 than that. What distinguished computer vision from the already existing field of digital image pro- cessing (Rosenfeld and Pfaltz 1966; Rosenfeld and Kak 1976) was a desire to recover the three-dimensional structure of the world from images and to use this as a stepping stone to- wards full scene understanding. Winston (1975) and Hanson and Riseman (1978) provide two nice collections of classic papers from this early period. Early attempts at scene understanding involved extracting edges and then inferring the 3D structure of an object or a “blocks world” from the topological structure of the 2D lines (Roberts 1965). Several line labeling algorithms (Figure 1.7a) were developed at that time (Huffman 1971; Clowes 1971; Waltz 1975; Rosenfeld, Hummel, and Zucker 1976; Kanade 1980). Nalwa (1993) gives a nice review of this area. The topic of edge detection was also 5 Boden (2006) cites (Crevier 1993) as the original source. The actual Vision Memo was authored by Seymour Papert (1966) and involved a whole cohort of students. 6 To see how far robotic vision has come in the last four decades, have a look at the towel-folding robot at http://rll.eecs.berkeley.edu/pr/icra10/ (Maitin-Shepard, Cusumano-Towner, Lei et al. 2010). Digital image processing Blocks world, line labeling Generalized Generalized cylinders cylinders Pictorial structures Stereo correspondence Intrinsic images Optical flow Structure from motion Image pyramids Scale-space processing Shape from shading, texture, and focus Physically-based modeling Regularization Markov Random Fields Kalman filters 3D range data processing Projective invariants Factorization Factorization Physics-based vision Graph cuts Particle filtering Energy-based segmentation Face Face recognition recognition and and detection detection Subspace methods Image-based modeling and rendering Texture synthesis and inpainting Computational photography Feature-based recognition MRF inference algorithms Category recognition Learning12 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) (e) (f) Figure 1.7 Some early (1970s) examples of computer vision algorithms: (a) line label- ing (Nalwa 1993) c 1993 Addison-Wesley, (b) pictorial structures (Fischler and Elschlager c c 1973) 1973 IEEE, (c) articulated body model (Marr 1982) 1982 David Marr, (d) intrin- sic images (Barrow and Tenenbaum 1981) c 1973 IEEE, (e) stereo correspondence (Marr 1982) c 1982 David Marr, (f) optical flow (Nagel and Enkelmann 1986) c 1986 IEEE. an active area of research; a nice survey of contemporaneous work can be found in (Davis 1975). Three-dimensional modeling of non-polyhedral objects was also being studied (Baum- gart 1974; Baker 1977). One popular approach used generalized cylinders, i.e., solids of revolution and swept closed curves (Agin and Binford 1976; Nevatia and Binford 1977), of- 7 ten arranged into parts relationships (Hinton 1977; Marr 1982) (Figure 1.7c). Fischler and Elschlager (1973) called such elastic arrangements of parts pictorial structures (Figure 1.7b). This is currently one of the favored approaches being used in object recognition (see Sec- tion 14.4 and Felzenszwalb and Huttenlocher 2005). A qualitative approach to understanding intensities and shading variations and explaining them by the effects of image formation phenomena, such as surface orientation and shadows, was championed by Barrow and Tenenbaum (1981) in their paper on intrinsic images (Fig- 1 ure 1.7d), along with the related 2 / -D sketch ideas of Marr (1982). This approach is again 2 seeing a bit of a revival in the work of Tappen, Freeman, and Adelson (2005). More quantitative approaches to computer vision were also developed at the time, in- cluding the first of many feature-based stereo correspondence algorithms (Figure 1.7e) (Dev 7 In robotics and computer animation, these linked-part graphs are often called kinematic chains.1.2 A brief history 13 1974; Marr and Poggio 1976; Moravec 1977; Marr and Poggio 1979; Mayhew and Frisby 1981; Baker 1982; Barnard and Fischler 1982; Ohta and Kanade 1985; Grimson 1985; Pol- lard, Mayhew, and Frisby 1985; Prazdny 1985) and intensity-based optical flow algorithms (Figure 1.7f) (Horn and Schunck 1981; Huang 1981; Lucas and Kanade 1981; Nagel 1986). The early work in simultaneously recovering 3D structure and camera motion (see Chapter 7) also began around this time (Ullman 1979; Longuet-Higgins 1981). A lot of the philosophy of how vision was believed to work at the time is summarized 8 in David Marr’s (1982) book. In particular, Marr introduced his notion of the three levels of description of a (visual) information processing system. These three levels, very loosely paraphrased according to my own interpretation, are:  Computational theory: What is the goal of the computation (task) and what are the constraints that are known or can be brought to bear on the problem?  Representations and algorithms: How are the input, output, and intermediate infor- mation represented and which algorithms are used to calculate the desired result?  Hardware implementation: How are the representations and algorithms mapped onto actual hardware, e.g., a biological vision system or a specialized piece of silicon? Con- versely, how can hardware constraints be used to guide the choice of representation and algorithm? With the increasing use of graphics chips (GPUs) and many-core ar- chitectures for computer vision (see Section C.2), this question is again becoming quite relevant. As I mentioned earlier in this introduction, it is my conviction that a careful analysis of the problem specification and known constraints from image formation and priors (the scientific and statistical approaches) must be married with efficient and robust algorithms (the engineer- ing approach) to design successful vision algorithms. Thus, it seems that Marr’s philosophy is as good a guide to framing and solving problems in our field today as it was 25 years ago. 1980s. In the 1980s, a lot of attention was focused on more sophisticated mathematical techniques for performing quantitative image and scene analysis. Image pyramids (see Section 3.5) started being widely used to perform tasks such as im- age blending (Figure 1.8a) and coarse-to-fine correspondence search (Rosenfeld 1980; Burt and Adelson 1983a,b; Rosenfeld 1984; Quam 1984; Anandan 1989). Continuous versions of pyramids using the concept of scale-space processing were also developed (Witkin 1983; Witkin, Terzopoulos, and Kass 1986; Lindeberg 1990). In the late 1980s, wavelets (see Sec- tion 3.5.4) started displacing or augmenting regular image pyramids in some applications 8 More recent developments in visual perception theory are covered in (Palmer 1999; Livingstone 2008).14 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) (e) (f) Figure 1.8 Examples of computer vision algorithms from the 1980s: (a) pyramid blending (Burt and Adelson 1983b) c 1983 ACM, (b) shape from shading (Freeman and Adelson 1991) c 1991 IEEE, (c) edge detection (Freeman and Adelson 1991) c 1991 IEEE, (d) physically based models (Terzopoulos and Witkin 1988) c 1988 IEEE, (e) regularization- c based surface reconstruction (Terzopoulos 1988) 1988 IEEE, (f) range data acquisition and merging (Banno, Masuda, Oishi et al. 2008) c 2008 Springer. (Adelson, Simoncelli, and Hingorani 1987; Mallat 1989; Simoncelli and Adelson 1990a,b; Simoncelli, Freeman, Adelson et al. 1992). The use of stereo as a quantitative shape cue was extended by a wide variety of shape- from-X techniques, including shape from shading (Figure 1.8b) (see Section 12.1.1 and Horn 1975; Pentland 1984; Blake, Zimmerman, and Knowles 1985; Horn and Brooks 1986, 1989), photometric stereo (see Section 12.1.1 and Woodham 1981), shape from texture (see Sec- tion 12.1.2 and Witkin 1981; Pentland 1984; Malik and Rosenholtz 1997), and shape from focus (see Section 12.1.3 and Nayar, Watanabe, and Noguchi 1995). Horn (1986) has a nice discussion of most of these techniques. Research into better edge and contour detection (Figure 1.8c) (see Section 4.2) was also active during this period (Canny 1986; Nalwa and Binford 1986), including the introduc- tion of dynamically evolving contour trackers (Section 5.1.1) such as snakes (Kass, Witkin, and Terzopoulos 1988), as well as three-dimensional physically based models (Figure 1.8d) (Terzopoulos, Witkin, and Kass 1987; Kass, Witkin, and Terzopoulos 1988; Terzopoulos and Fleischer 1988; Terzopoulos, Witkin, and Kass 1988). Researchers noticed that a lot of the stereo, flow, shape-from-X, and edge detection al-1.2 A brief history 15 gorithms could be unified, or at least described, using the same mathematical framework if they were posed as variational optimization problems (see Section 3.7) and made more ro- bust (well-posed) using regularization (Figure 1.8e) (see Section 3.7.1 and Terzopoulos 1983; Poggio, Torre, and Koch 1985; Terzopoulos 1986b; Blake and Zisserman 1987; Bertero, Pog- gio, and Torre 1988; Terzopoulos 1988). Around the same time, Geman and Geman (1984) pointed out that such problems could equally well be formulated using discrete Markov Ran- dom Field (MRF) models (see Section 3.7.2), which enabled the use of better (global) search and optimization algorithms, such as simulated annealing. Online variants of MRF algorithms that modeled and updated uncertainties using the Kalman filter were introduced a little later (Dickmanns and Graefe 1988; Matthies, Kanade, and Szeliski 1989; Szeliski 1989). Attempts were also made to map both regularized and MRF algorithms onto parallel hardware (Poggio and Koch 1985; Poggio, Little, Gamble et al. 1988; Fischler, Firschein, Barnard et al. 1989). The book by Fischler and Firschein (1987) contains a nice collection of articles focusing on all of these topics (stereo, flow, regularization, MRFs, and even higher-level vision). Three-dimensional range data processing (acquisition, merging, modeling, and recogni- tion; see Figure 1.8f) continued being actively explored during this decade (Agin and Binford 1976; Besl and Jain 1985; Faugeras and Hebert 1987; Curless and Levoy 1996). The compi- lation by Kanade (1987) contains a lot of the interesting papers in this area. 1990s. While a lot of the previously mentioned topics continued to be explored, a few of them became significantly more active. A burst of activity in using projective invariants for recognition (Mundy and Zisserman 1992) evolved into a concerted effort to solve the structure from motion problem (see Chap- ter 7). A lot of the initial activity was directed at projective reconstructions, which did not require knowledge of camera calibration (Faugeras 1992; Hartley, Gupta, and Chang 1992; Hartley 1994a; Faugeras and Luong 2001; Hartley and Zisserman 2004). Simultaneously, fac- torization techniques (Section 7.3) were developed to solve efficiently problems for which or- thographic camera approximations were applicable (Figure 1.9a) (Tomasi and Kanade 1992; Poelman and Kanade 1997; Anandan and Irani 2002) and then later extended to the perspec- tive case (Christy and Horaud 1996; Triggs 1996). Eventually, the field started using full global optimization (see Section 7.4 and Taylor, Kriegman, and Anandan 1991; Szeliski and Kang 1994; Azarbayejani and Pentland 1995), which was later recognized as being the same as the bundle adjustment techniques traditionally used in photogrammetry (Triggs, McLauch- lan, Hartley et al. 1999). Fully automated (sparse) 3D modeling systems were built using such techniques (Beardsley, Torr, and Zisserman 1996; Schaffalitzky and Zisserman 2002; Brown and Lowe 2003; Snavely, Seitz, and Szeliski 2006). Work begun in the 1980s on using detailed measurements of color and intensity combined16 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) (e) (f) Figure 1.9 Examples of computer vision algorithms from the 1990s: (a) factorization-based structure from motion (Tomasi and Kanade 1992) c 1992 Springer, (b) dense stereo match- ing (Boykov, Veksler, and Zabih 2001), (c) multi-view reconstruction (Seitz and Dyer 1999) c 1999 Springer, (d) face tracking (Matthews, Xiao, and Baker 2007), (e) image segmenta- tion (Belongie, Fowlkes, Chung et al. 2002) c 2002 Springer, (f) face recognition (Turk and Pentland 1991a). with accurate physical models of radiance transport and color image formation created its own subfield known as physics-based vision. A good survey of the field can be found in the three- volume collection on this topic (Wolff, Shafer, and Healey 1992a; Healey and Shafer 1992; Shafer, Healey, and Wolff 1992). Optical flow methods (see Chapter 8) continued to be improved (Nagel and Enkelmann 1986; Bolles, Baker, and Marimont 1987; Horn and Weldon Jr. 1988; Anandan 1989; Bergen, Anandan, Hanna et al. 1992; Black and Anandan 1996; Bruhn, Weickert, and Schnorr ¨ 2005; Papenberg, Bruhn, Brox et al. 2006), with (Nagel 1986; Barron, Fleet, and Beauchemin 1994; Baker, Black, Lewis et al. 2007) being good surveys. Similarly, a lot of progress was made on dense stereo correspondence algorithms (see Chapter 11, Okutomi and Kanade (1993, 1994); Boykov, Veksler, and Zabih (1998); Birchfield and Tomasi (1999); Boykov, Veksler, and Zabih (2001), and the survey and comparison in Scharstein and Szeliski (2002)), with the biggest breakthrough being perhaps global optimization using graph cut techniques (Fig- ure 1.9b) (Boykov, Veksler, and Zabih 2001).1.2 A brief history 17 Multi-view stereo algorithms (Figure 1.9c) that produce complete 3D surfaces (see Sec- tion 11.6) were also an active topic of research (Seitz and Dyer 1999; Kutulakos and Seitz 2000) that continues to be active today (Seitz, Curless, Diebel et al. 2006). Techniques for producing 3D volumetric descriptions from binary silhouettes (see Section 11.6.2) continued to be developed (Potmesil 1987; Srivasan, Liang, and Hackwood 1990; Szeliski 1993; Lau- rentini 1994), along with techniques based on tracking and reconstructing smooth occluding contours (see Section 11.2.1 and Cipolla and Blake 1992; Vaillant and Faugeras 1992; Zheng 1994; Boyer and Berger 1997; Szeliski and Weiss 1998; Cipolla and Giblin 2000). Tracking algorithms also improved a lot, including contour tracking using active contours (see Section 5.1), such as snakes (Kass, Witkin, and Terzopoulos 1988), particle filters (Blake and Isard 1998), and level sets (Malladi, Sethian, and Vemuri 1995), as well as intensity-based (direct) techniques (Lucas and Kanade 1981; Shi and Tomasi 1994; Rehg and Kanade 1994), often applied to tracking faces (Figure 1.9d) (Lanitis, Taylor, and Cootes 1997; Matthews and Baker 2004; Matthews, Xiao, and Baker 2007) and whole bodies (Sidenbladh, Black, and Fleet 2000; Hilton, Fua, and Ronfard 2006; Moeslund, Hilton, and Kruger ¨ 2006). Image segmentation (see Chapter 5) (Figure 1.9e), a topic which has been active since the earliest days of computer vision (Brice and Fennema 1970; Horowitz and Pavlidis 1976; Riseman and Arbib 1977; Rosenfeld and Davis 1979; Haralick and Shapiro 1985; Pavlidis and Liow 1990), was also an active topic of research, producing techniques based on min- imum energy (Mumford and Shah 1989) and minimum description length (Leclerc 1989), normalized cuts (Shi and Malik 2000), and mean shift (Comaniciu and Meer 2002). Statistical learning techniques started appearing, first in the application of principal com- ponent eigenface analysis to face recognition (Figure 1.9f) (see Section 14.2.1 and Turk and Pentland 1991a) and linear dynamical systems for curve tracking (see Section 5.1.1 and Blake and Isard 1998). Perhaps the most notable development in computer vision during this decade was the increased interaction with computer graphics (Seitz and Szeliski 1999), especially in the cross-disciplinary area of image-based modeling and rendering (see Chapter 13). The idea of manipulating real-world imagery directly to create new animations first came to prominence with image morphing techniques (Figure1.5c) (see Section 3.6.3 and Beier and Neely 1992) and was later applied to view interpolation (Chen and Williams 1993; Seitz and Dyer 1996), panoramic image stitching (Figure1.5a) (see Chapter 9 and Mann and Picard 1994; Chen 1995; Szeliski 1996; Szeliski and Shum 1997; Szeliski 2006a), and full light-field rendering (Figure 1.10a) (see Section 13.3 and Gortler, Grzeszczuk, Szeliski et al. 1996; Levoy and Hanrahan 1996; Shade, Gortler, He et al. 1998). At the same time, image-based modeling techniques (Figure 1.10b) for automatically creating realistic 3D models from collections of images were also being introduced (Beardsley, Torr, and Zisserman 1996; Debevec, Taylor, and Malik 1996; Taylor, Debevec, and Malik 1996).18 Computer Vision: Algorithms and Applications (September 3, 2010 draft) (a) (b) (c) (d) (e) (f) Figure 1.10 Recent examples of computer vision algorithms: (a) image-based rendering (Gortler, Grzeszczuk, Szeliski et al. 1996), (b) image-based modeling (Debevec, Taylor, and Malik 1996) c 1996 ACM, (c) interactive tone mapping (Lischinski, Farbman, Uyttendaele et al. 2006a) (d) texture synthesis (Efros and Freeman 2001), (e) feature-based recognition (Fergus, Perona, and Zisserman 2007), (f) region-based recognition (Mori, Ren, Efros et al. 2004) c 2004 IEEE. 2000s. This past decade has continued to see a deepening interplay between the vision and graphics fields. In particular, many of the topics introduced under the rubric of image-based rendering, such as image stitching (see Chapter 9), light-field capture and rendering (see Section 13.3), and high dynamic range (HDR) image capture through exposure bracketing (Figure1.5b) (see Section 10.2 and Mann and Picard 1995; Debevec and Malik 1997), were re-christened as computational photography (see Chapter 10) to acknowledge the increased use of such techniques in everyday digital photography. For example, the rapid adoption of exposure bracketing to create high dynamic range images necessitated the development of tone mapping algorithms (Figure 1.10c) (see Section 10.2.1) to convert such images back to displayable results (Fattal, Lischinski, and Werman 2002; Durand and Dorsey 2002; Rein- hard, Stark, Shirley et al. 2002; Lischinski, Farbman, Uyttendaele et al. 2006a). In addition to merging multiple exposures, techniques were developed to merge flash images with non-flash counterparts (Eisemann and Durand 2004; Petschnigg, Agrawala, Hoppe et al. 2004) and to interactively or automatically select different regions from overlapping images (Agarwala,1.3 Book overview 19 Dontcheva, Agrawala et al. 2004). Texture synthesis (Figure 1.10d) (see Section 10.5), quilting (Efros and Leung 1999; Efros and Freeman 2001; Kwatra, Schodl, ¨ Essa et al. 2003) and inpainting (Bertalmio, Sapiro, Caselles et al. 2000; Bertalmio, Vese, Sapiro et al. 2003; Criminisi, Perez, ´ and Toyama 2004) are additional topics that can be classified as computational photography techniques, since they re-combine input image samples to produce new photographs. A second notable trend during this past decade has been the emergence of feature-based techniques (combined with learning) for object recognition (see Section 14.3 and Ponce, Hebert, Schmid et al. 2006). Some of the notable papers in this area include the constellation model of Fergus, Perona, and Zisserman (2007) (Figure 1.10e) and the pictorial structures of Felzenszwalb and Huttenlocher (2005). Feature-based techniques also dominate other recognition tasks, such as scene recognition (Zhang, Marszalek, Lazebnik et al. 2007) and panorama and location recognition (Brown and Lowe 2007; Schindler, Brown, and Szeliski 2007). And while interest point (patch-based) features tend to dominate current research, some groups are pursuing recognition based on contours (Belongie, Malik, and Puzicha 2002) and region segmentation (Figure 1.10f) (Mori, Ren, Efros et al. 2004). Another significant trend from this past decade has been the development of more efficient algorithms for complex global optimization problems (see Sections 3.7 and B.5 and Szeliski, Zabih, Scharstein et al. 2008; Blake, Kohli, and Rother 2010). While this trend began with work on graph cuts (Boykov, Veksler, and Zabih 2001; Kohli and Torr 2007), a lot of progress has also been made in message passing algorithms, such as loopy belief propagation (LBP) (Yedidia, Freeman, and Weiss 2001; Kumar and Torr 2006). The final trend, which now dominates a lot of the visual recognition research in our com- munity, is the application of sophisticated machine learning techniques to computer vision ¨ problems (see Section 14.5.1 and Freeman, Perona, and Scholkopf 2008). This trend coin- cides with the increased availability of immense quantities of partially labelled data on the Internet, which makes it more feasible to learn object categories without the use of careful human supervision. 1.3 Book overview In the final part of this introduction, I give a brief tour of the material in this book, as well as a few notes on notation and some additional general references. Since computer vision is such a broad field, it is possible to study certain aspects of it, e.g., geometric image formation and 3D structure recovery, without engaging other parts, e.g., the modeling of reflectance and shading. Some of the chapters in this book are only loosely coupled with others, and it is not strictly necessary to read all of the material in sequence.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.