Google Behemoth

google what does behemoth mean and how google search technology works in general and google proprietary search engine technology pdf free
ErrolFord Profile Pic
ErrolFord,France,Professional
Published Date:03-08-2017
Your Website URL(Optional)
Comment
Understanding the Google Behemoth Google is a colossus that sits astride access to information on the World Wide Web. Ubiquitous, useful, and often imitated—but seldom equaled—Google has lent its name to a verb: to google something (or someone) is to search for the thing or person on the Web. Google is also a forward-looking corporation filled with brilliant thinkers and one of the largest companies in the world in terms of market capitalization. From its roots as a search engine company, Google has emerged as a leader in many spheres, from applications that are used to watch videos on the Web to on-demand office pro- ductivity software. The primary focus of this book is making money with the Google advertising applica- tions: the AdSense and AdWords programs. These programs are closely related to Google’s searching technology. An AdSense ad is placed on your website depending on the context of your site (in other words, Google’s analysis of how your site is likely to be found in response to a variety of searches). And the very same AdWords ad is targeted using keywords and phrases—the same keywords and phrases used when searching for something with Google. The close relationship between Google web searching technology and the advertising programs means that it is important to understand a little about the syntax of Google searches when working with the AdSense program or crafting AdWords campaigns. It’s not that I propose to teach you how to use Google to search in this book. Rather, you need a sense of how others may be using Google to search when they come across your sites or ads. To get the most out of working with Google, you also need to understand the parts of Google. It’s not easy to get a grasp of what Google is and what Google does in addition to web search. For one thing, the parts of Google don’t all work together seamlessly. This chapter starts with an overview explaining the parts of Google and what they do so that you’ll get a sense of what Google resources may be available to you and how all these moving parts integrate (or, as the case may be, don’t integrate). 179 After explaining Google’s search syntax and exploring what Google has to offer gen- erally, this chapter drills down on Google’s role as an automated advertising broker with these programs and explains, in general terms, how these programs are related and how you work with them. The Parts of Google Google’s parts can, roughly speaking, be divided into the following categories: Services These let people do something (for example, search the Web or create a blog) Tools Software to make chores easier (for example, the Google Toolbar or the Picasa image manipulation software) Developer tools Programs aimed at software developers, such as the AdWords API Advertising solutions Programs such as AdSense and AdWords Business solutions Products intended to be used as part of an enterprise infrastructure, such as the Google Enterprise search appliance Obviously, many of these aspects of Google are beyond the scope of this book, which focuses on making money with Google advertising and the AdSense and AdWords programs. Google itself takes a more pragmatic tack in categorizing its applications that are avail- able to consumers. On the More Google Products page, partly shown in Figure 7-1, you’ll find the following categories: • Search • Explore and innovate • Communicate, show & share • Go mobile • Make your computer work better Some of the parts of Google can be opened directly from the Google home page. If you don’t see the link you are interested in on the Google home page, open the More Google Products page by starting at Google’s home page. Click the More link found on the upper left of the page, then choose “even more” from the drop-down list. 180 Chapter 7: Understanding the Google Behemoth You can reach the More Google Products page directly by opening http: //www.google.com/options/ in your browser. Figure 7-1. Google’s consumer applications are categorized on the More Google Products page However you categorize this vast collection of applications and tools, you should at least be aware of the scope of what is available. This section explains the parts of Google you should know about, with a focus on the parts that are relevant to advertising. Don’t be fazed because Google marks an application as still in beta. Google tends to call applications beta (supposedly meaning still in test- ing and not ready for release) long after most companies would declare the software complete. In addition to the software shown on the More Google Products page, Google has numerous applications that have not yet made it to product status. You can test drive much of this software from the Google Labs page, partially shown in Figure 7-2. The Parts of Google 181 Figure 7-2. The Google Labs page allows you to try Google software that hasn’t yet made it to product status Even More Google Parts Parts of Google not discussed in this section (because there’s no clear and obvious connection to advertising at the present time) include: • Alerts (automatic notifications of news and search results by email) • Blog Search (searches through current blogs by topic) • Calendar (organizes your schedule and shares events with others) • Checkout (an online payment mechanism that competes with PayPal) • Google Chrome (a streamlined web browser) • Desktop (searches the files on your desktop computer using an interface that looks like Google’s web search) • Docs (online word processing, presentations, and spreadsheets) • Earth (explore the earth and sky using maps, satellite photos, and more) • Finance (customizable business news) • Groups (bulletin board posts on every conceivable subject) • Images (comprehensive image search on the Web) • News (lets you search news items) • Picasa (image management and lightweight image editing) • Translate (automatic translation of text and web pages) These Google parts may not be the primary focus of this book, but even a quick glance should give you respect for the breadth and depth of Google’s offerings. Also, Google has a near-miraculous ability to build genuinely useful applications and subsequently add advertising, in ways that actually add to the utility of the application. In the future, these applications may be used to host or locate advertising. 182 Chapter 7: Understanding the Google Behemoth Google is a moving target; it’s constantly innovating, releasing software, and acquiring software companies. No static list of Google parts is ever likely to be up-to-date or final. You’ll find links to almost all the parts of Google on the More Google Products page; I’ll also provide a direct address to each Google part I discuss in the following sections. Blogger Blogger is one of the largest hosted blogging services on the Web. Blogger hosts hun- dreds of thousands of blogs, and it is free and easy to use. From an advertiser’s view- point, Blogger and other hosted blogging services are interesting because they provide Google with a venue for AdWords contextual ads, categorized by the specific interest of the blog author. Some of the proceeds from these blogs go to the content creator, although you’ll probably make more money by hosting your own content and displayed AdSense units. Book Search Google Books lets users search through books submitted to the program by publishers and other copyright holders. Google scans the books and hosts the resulting pages on Google servers. These pages are then used by Google to display contextual ads. Google pays a portion of the revenue from the ads to the owner of the materials. Google Book Search represents an interesting venue for advertisements, as well as a possible profit center for authors and publishers. At the time of this writing, Google has reached an extensive and general licensing agreement with publishers and authors. Assuming this agreement becomes effective, the content available under this program (and potential sites to host advertising) will greatly expand. Directory Google Directory uses the categorization scheme and sites selected by the Open Directory Project (ODP) to find information that has been vetted by volunteer editors familiar with a particular subject. As I explained in Chapter 2, the ODP is important to you if you want to drive traffic to your site. You can use Google Directory to explore Google’s use of the ODP taxonomy. Gmail Gmail is one of the best free email services on the Web, with good anti-spam technol- ogy, plentiful storage, and excellent searching capabilities. Gmail is used to host text and link ads that are relevant to a recipient’s email. The Parts of Google 183GOOG-411 GOOG-411, 1-800-GOOG-411, is a directory assistance program that will partially be supported by audio advertisements relevant to searches requested. Google Health Google Health is intended to help people organize their health records in one place online. This is an ambitious and laudable goal not intimately related to advertising, but note that the portions of the application that are used to search for health care providers do host contextual advertisements. Maps Google Maps is a service that provides maps and directions. When you search within Google Maps for an address, business, or landmark, the results page shows businesses that seem related to your search (for example, Japanese restaurants as in Figure 7-3), and also businesses located close to your search address. Figure 7-3. Google Maps results are used to host geographically relevant ads In Figure 7-3, the top “sponsored” link is a paid AdWords placement while the other listings come from natural search results. See Chap- ter 2 for more about the distinction. For advertisers who draw business based on location—such as restaurants—the ability to place proximity-based ads on Google Maps may prove to be extremely significant. 184 Chapter 7: Understanding the Google BehemothProduct Search Google Product Search lets you search through many of the products available on the Web. Sellers can add products to Product Search using a data feed; there’s no cost for inclusion. Google does use search results pages to host advertisements. Scholar Google Scholar lets you search for academic, peer-reviewed articles and citations. Al- though Scholar has had some rather mixed reviews, it is certainly one of the largest free repositories online of scholarly material, and Google Scholar search results are another place Google displays contextual advertising. YouTube YouTube, shown in Figure 7-4, is the leading application for watching, uploading, and sharing videos. Figure 7-4. YouTube is probably the most popular site for watching videos on the Web There are several advertising programs on YouTube that are either operational or in the works. Google enrolls partners that draw substantial traffic on YouTube in a pro- The Parts of Google 185gram that derives revenue from contextual ads. The ads display on the pages that host the content. Other ads display for certain content as video overlays. These kinds of ad programs are interesting to content owners. For example, a website might promote videos on YouTube that generate revenue. For details, see Chapter 8. Obviously, YouTube is also interesting to advertisers. Check out http://www.youtube .com/t/advertising for more information. Content Versus Search Google contextual ads appear on web content as in Figure 7-5 when the owner of the web content site signs up with the AdSense program. Figure 7-5. Content ads appear on websites that have enrolled in the AdSense program 186 Chapter 7: Understanding the Google Behemoth See Chapter 6 for a discussion of how well (or poorly) contextual rele- vance software actually works. These ads also appear on search results pages, as shown in Figure 7-6. Figure 7-6. Entries marked as sponsored (right) are paid ads, while the longer entries on the left are natural search results Within the Google AdSense program, ads of the first kind are called content ads, and those of the second kind are search ads. If a user goes to the Google home page and searches, only Google profits from the ads placed on the search results page. If, however, the search originates from an AdSense search unit, like the one shown in Figure 7-7, the owner of the site hosting the search box shares in any proceeds from the search. See Chapter 8 for information about adding a search box to your site, and Chap- ter 10 for a comparison of content and search ads from the viewpoint of an advertiser. Anatomy of a Search Query The primary Google search interface, the Google home page, is famously simple and uncluttered (as shown in Figure 7-8). You enter a word or words, also called keywords, search terms, or queries, in the Google search form. As you probably know, when you click the I’m Feeling Lucky button, Google opens the page that is the top-ranked search result for your query. Anatomy of a Search Query 187 Figure 7-7. When a search box is placed on a content site, ads on the search results page are profitable to the site owner Figure 7-8. In its simplest form, Google search returns results for keywords entered in the search box Experienced researchers don’t usually bother with the I’m Feeling Lucky button because it is unlikely you will find what you need this way and it wastes time—even if it is fun Clicking the Google Search button opens the first page of Google’s search results for your query. Google’s search results pages also display AdWords ads that are contex- tually relevant to the query that generated the pages. 188 Chapter 7: Understanding the Google BehemothGoogle Syntax and Operators Google searches support a number of operators, including: AND The AND operator tells Google to explicitly join two keywords in a query. It must be uppercase (cannot be written and). OR The OR operator, which can also be written using the pipe character (), matches any of the terms joined with this operator in a query. It must be uppercase (cannot be written or). + The “plus” operator, called the inclusion operator, forces Google to include words, such as stop words (defined shortly), in a search. - The “minus” operator, called the exclusion operator, looks for results that do not have the specified keyword in them. For example, a search for virus -computer finds results that have to do with viruses, but not computers (particularly useful if you are looking for biologic viruses). To avoid confusion, all search terms are printed in this book in literal font (as in Google AdWords). If quotes are shown in the search term—as in "Computer Programming"—then those quotes are part of the search term and would be typed in by the user. You should also know that Google searches omit many common words, called stop words. Stop words that are omitted include “and,” “for,” “the,” and most punctuation. If you want to include a stop word in your search, you need to include it within double quotes. Double quoting also serves the purpose of searching for an entire quoted string. For example, to search for the film Star Wars: Episode III, you could use the query "Star Wars III". Without the quotes, the III would be omitted as a stop word. The Rules of Simple Search Searching with Google can be really simple, but it helps to keep some basic rules of Google search syntax in mind: Implicit AND connection Google assumes that two or more words in a query are connected by an AND oper- ator, even when theAND is omitted. A search forLandscape Photography is the same as a search for Landscape AND Photography. Anatomy of a Search Query 189All-word search Google searches for all words in a query, unless they are stop words. Results can be anywhere A successful search finds results anywhere in a document (such as in HTML and meta information), not just in its text. Word order matters The words in a search are ordered in terms of importance from left to right. Proximity counts Words in a query that are close together in a search result are returned ahead of results where the words are farther apart. Google is not case-sensitive Google does not care about capitalization. A search for new york returns results that include New York, and a search for New York returns new york with lowercase initial letters. Effective Searching Google searches tend to be more effective—producing better search results—if the following concepts are kept in mind: Google looks for words, not meaning Google’s algorithms look for the occurrences of words and phrases, not the mean- ing of words. So it helps to think about how words are likely to be used in context and in web pages when formulating a search. Specificity and distinctiveness in keyword choice helps If you search using generic words—words that are used in a great many documents on the Web—you won’t get as useful a result set as you would if you pinpointed more unusual words that are relevant to your search. Use singular, plural, and alternate word forms Since Google is looking for words, not meaning, you may need to use alternative forms of words in your searches to get the widest results. A search forphotograph, photographs, and photography may each yield different results. You can use theOR operator to search for several forms of the same word: photograph photographs photography. These concepts related to effective searching have big implications for participants in the AdWords program (see Part III for more about AdWords). An important part of AdWords is selecting the right keywords to target your ads against. It’s hard to 190 Chapter 7: Understanding the Google Behemothcost-effectively target generic words that generate massive search results; it makes much more sense to target narrow, quirky words (and phrases). Advanced Search Google Advanced Search, shown in Figure 7-9, can be opened using the URL http:// www.google.com/advanced_search. Google Advanced Search implements the operators explained earlier in “Google Syntax and Operators” on page 189 (and a number of additional operators that space considerations precluded me from explaining) using a visual interface, so you don’t need to enter the operators as part of a search query. Figure 7-9. Google Advanced Search lets you implement sophisticated searching without understanding Google’s query language The Search Results Page A typical Google search results page is shown in Figure 7-10. It’s a good idea to learn a little more about what to expect on a results page and what ads to expect, because Google search results pages are where more than half of all AdWords ads turn up. Ads placed with the Google network using AdWords show up on web content (via AdSense), in third-party pages with whom Google has con- tracted, and on Google’s search results. The Google search results are the most important of these from a dollars-and-cents viewpoint and also have the best CTR. Anatomy of a Search Query 191 For a given keyword, relevant results are returned in the order of their PageRank—the complex formula Google uses to determine the importance of a web page—in Google’s index. Each search results page provides statistics in the upper-right corner (above the actual search results) that show you an estimate of how many results were found and how long a search took. Figure 7-10. A Google search results page provides a great deal of information in each result block as well as “sponsored links” (AdWords ads) Each of the results on the page is represented by a snippet of text from the web page the result points to, called a search results block. A link to the web page is part of the search results block, with the title of the page as the text for the link if it is available (the page’s URL is used if the title isn’t available). Each results block also provides a Cached link (if this is available) and a Similar Pages link. If you click the Cached link, a copy of the page saved by Google’s servers will open. This is useful in case the page has changed since it was indexed by Google. It’s also handy for finding where on a page the search terms are located: they are highlighted in the cached version. The cache will also tell you if the query appears only in links pointing to the page, and not on the page itself. The Similar Pages link opens pages that Google determines bear a close relationship to the page found in the search results. 192 Chapter 7: Understanding the Google Behemoth Using the Google related operator in a search is equivalent to clicking the Similar Pages link following a search result. Following Similar Pages links for a search is a great technique for participants in Ad- Words to ferret out keyword alternatives. The sites that are in part of the similar results may be what the traffic you are interested in selling to is interested in visiting; you can get ideas from these sites about what keywords to bid on. Learning More About Google Search This section provides enough about the mechanics of working with Google search so that you can skillfully use the Google AdSense and AdWords programs. But, obviously, it is not a complete guide to becoming an experienced researcher with Google. For more information about researching with Google, begin with the Google Help documentation. A good starting place on the Web is “The Essentials of Google Search”. Google: The Missing Manual (O’Reilly) is a great introduction to Google search tools and techniques. Google Hacks: Tips & Tools for Finding and Using the World’s Infor mation, Third Edition (O’Reilly) provides more in-depth technical information. My own Building Research Tools with Google for Dummies (Wiley) explains how to use Google as a professional research tool, what information you can expect to find in Google (and what isn’t there), and how to evaluate the credibility of information you do find on the Web. The Automated Ad Broker: AdWords Google places ads on its own properties—most significantly on search results pages— and on websites that have signed up for the AdSense program. Google also places ads in third-party content networks to extend the range of its ads even further. Advertisers—businesses and people with something to sell or promote—sign up with Google via the AdWords program. Working with AdWords, which involves bidding a maximum amount for particular keywords, is explained in detail in Chapter 10. The Automated Ad Broker: AdWords 193 Google’s software sits like an automated advertising broker between the two halves of this equation, as shown in Figure 7-11. It’s a really important point. If a content ad is hosted by Google and appears on your site, the advertiser and the publisher could theoretically cut out the company in the middle (Google), if each knew who the other was and could negotiate a price both felt was fair. For example, if I publish a site with information about digital photography, and I notice that online camera stores often provide the Google AdSense ads that appear on my site, I could theoretically approach one of these camera stores and negotiate a deal to carry ads for the store on my site that did not pay Google a commission. However, that assumes that I know who to contact and want to take on the added responsibility of a direct interface with the stores. Since many people don’t want this extra responsibility, using Google as an intermediary turns out to be a good solution. Like all successful intermediaries, Google’s job is introducing parties and establishing a market pricing mechanism that both sides feel is fair (or, at least, that they can live with). What’s unusual about Google is the success, scale, and automation with which it achieves this intermediation. Figure 7-11. Google is the intermediary between AdSense accounts and AdWords advertisers In the case of advertisements that are placed on websites participating in the AdSense content network, the business model is really simple. Google takes in money from the advertisers and pays out money to the owners of the web content. Your goal, if you are a website content owner, should be to maximize your share of this revenue stream, and Google’s game is to make the most of the difference between what it has to pay for ad space (AdSense) and what it can take in placing ad inventory (AdWords). 194 Chapter 7: Understanding the Google Behemoth The Google inventory of pages that can host ads is bifurcated, however, and Google’s model with its own search results pages is different and more complex. Google’s search results pages make Google a content owner of an incredibly valuable web property— one that is, however, difficult and expensive to maintain. Google’s profit in this portion of its business comes from taking in more ad revenue than it pays out to maintain and improve its search application (and, to some degree, the other parts of Google). The goal of a participant in the AdWords program who is looking to place ads on the Google search network is to maximize the effectiveness of its expenditures on the AdWords program. What About Click Fraud? Click fraud means clicking on contextual ads with no interest in purchasing the goods or services advertised, usually with the intention of defrauding the advertiser or en- riching the contextual publisher. Most often, click fraud occurs as part of an effort to raise expenses for a competitor—by making them pay for the bogus clicks—or as an attempt at self-enrichment by a publisher (by clicking on ads on its own pages). Click fraud is a serious problem, at least in terms of perceptions, on the Internet for contextual advertising vendors. Google has major efforts underway to detect click fraud, which are in the aggregate fairly successful, but the details of these programs are (for obvious reasons) secret. The bottom line: • On a very small scale, it is possible to commit click fraud and get away with it. However, as a publisher, you should take care to be totally aboveboard. If Google suspects you of click fraud, it will most likely close your account and possibly ban your sites from the Google search index. • Detecting click fraud is a statistical matter. Once the fraud becomes statistically significant, it will probably be detected. Contextual advertising does work and delivers targeted prospects much more effec- tively than any other method. A small amount of click fraud is a fact of life—most advertisers regard it as a cost of doing business that does not diminish the relative effectiveness of CPC advertising. Action Items To become an effective and productive user of the Google AdSense and AdWords pro- grams, you should: • Get a sense of Google’s gigantic extent • Spend some time getting a grasp on what the main parts of Google contain and how they relate to your advertising agenda Action Items 195 • Understand the difference between content and search advertising • Learn the basics of Google’s search syntax • Think about the queries users are likely to use to find your products or services (or products and services similar to yours) • Understand Google’s brokering function between the AdSense and AdWords programs • Try to see the world from both viewpoints: AdSense publishers will make more if they understand how AdWords works, and AdWords advertisers will be more effective if they consider the position of content owners who are signed up with AdSense 196 Chapter 7: Understanding the Google Behemoth