Bayesian statistics formula

how to use bayesian statistics and bayesian statistics with r
HalfoedGibbs Profile Pic
HalfoedGibbs,United Kingdom,Professional
Published Date:02-08-2017
Your Website URL(Optional)
Comment
6 bayesian statistics Get past first base He says he’s not like the other guys, but how different is he exactly? You’ll always be collecting new data. And you need to make sure that every analysis you do incorporates the data you have that’s relevant to your problem. You’ve learned how falsification can be used to deal with heterogeneous data sources, but what about straight up probabilities? The answer involves an extremely handy analytic tool called Bayes’ rule, which will help you incorporate your base rates to uncover not-so-obvious insights with ever-changing data. this is a new chapter 169feeling ok? The doctor has disturbing news Your eyes are not deceiving you. Your doctor has given you a diagnosis of lizard flu. The good news is that lizard flu is not fatal and, if you have it, you’re in for a full recovery after a few weeks of treatment. The bad news is that lizard flu is a big pain in the butt. You’ll have to miss work, and you will have to stay away from your loved ones for a few weeks. Cough You Your doctor is convinced that you have it, but because you’ve become so handy with data, you might want to take a look at the test and see just how accurate it is. 170 Chapter 6 Lizard flu test results Date: Today Name: Head First Data Analyst Diagnosis: Positive Here’s some information on lizard flu: Lizard flu is a tropical disease first observed among lizard researchers in South America. The disease is highly contagious, and affected patients need to be quarantined in their homes for no fewer than six weeks. Patients diagnosed with lizard flu have been known to “taste the air” and in extreme cases have developed temporary chromatophores and zygodactylous feet. POSITIVEbayesian statistics A quick web search on the lizard flu diagnostic test has yielded this result: an analysis of the test’s accuracy. Mde -O-Piead 90%… that looks pretty solid. Lizard flu diagnostic test Accuracy analysis If someone has lizard flu, the probability that the test returns positive for it is 90 percent. If someone doesn’t have lizard flu, the probability that the test returns positive This is an interesting statistic. for it is 9 percent. In light of this information, what do you think is the probability that you have lizard flu? How did you come to your decision? you are here 4 171be careful with probabilities You just looked at some data on the efficacy of the lizard flu diagnostic test. What did you decide were the chances that you have the disease? Lizard flu diagnostic test Accuracy analysis If someone has lizard flu, the probability that the test returns positive for it is 90 percent. If someone doesn’t have lizard flu, the probability that the test returns positive for it is 9 percent. In light of this information, what do you think is the probability that you have lizard flu? How did you come to your decision? It looks like the chances would be 90% if I had the disease. But not everyone has the disease, as the second statistic shows. So I should revise my estimate down a little bit. But it doesn’t seem like the answer is going to be exactly 90%-9%=81%, because that would be too easy, so, I dunno, maybe 75%? The answer is way lower than 75% 75% is the answer that most people give to this sort of question. And they’re way off. There is so much at stake Not only is 75% the wrong answer, but it’s not anywhere near the right in getting the answer to answer. And if you started making decisions with this question correct. the idea that there’s a 75% chance you have lizard We are totally going to get to the flu, you’d be making an even bigger mistake bottom of this… 172 Chapter 6bayesian statistics Let’s take the accuracy analysis one claim at a time Lizard flu diagnostic test Accuracy analysis There are two different and obviously important claims being made about the test: the rate at which If someone has lizard flu, the probability the test returns “positive” varies depending on that the test returns positive for it is 90 percent. whether the person has lizard flu or not. If someone doesn’t have lizard flu, the So let’s imagine two different worlds, one probability that the test returns positive where a lot of people have lizard flu and one where for it is 9 percent. few people have it, and then look at the claim about “positive” scores for people who don’t have lizard flu. ` Take a closer look at the second statement and answer the questions below. Lizard flu diagnostic test Accuracy analysis If someone doesn’t have lizard flu, the probability that the test returns positive for it is 9 percent. Scenario 1 Scenario 2 If 90 out of 100 people have it, how many If 10 out of 100 people have it, how many people who don’t have it test positive? people who don’t have it test positive? you are here 4 173 Start here. Think really hard about this. Let’s really get the meaning of this statement…prevalence counts Does the number of people who have the disease affect how many people are wrongly told that they test positive? Lizard flu diagnostic test Accuracy analysis If someone doesn’t have lizard flu, the probability that the test returns positive for it is 9 percent. Scenario 1 Scenario 2 If 90 out of 100 poeple have it, how many If 10 out of 100 people have it, how many people who don’t have it test positive? people who don’t have it test positive? This means that 10 people don’t have it, so This means that 90 people don’t have it, we take 9% of 10 people, which is about 1 so we take 9% of 90 people, which is 10 person who tests positive but doesn’t have it. people who test positive but don’t have it. We need more data to make sense of that diagnostic test… How common is lizard flu really? At least when it comes to situations where people who don’t have lizard flu test positively, it seems that the prevalence of lizard flu in the general population makes a big difference. In fact, unless we know how many people already have lizard flu, in addition to the accuracy analysis of the test, we simply cannot figure out how likely it is that you have lizard flu. 174 Chapter 6Here’s Scenario 1, where lots of people have lizard flu 9% of the people who don’t have it is quite a lot of false positives bayesian statistics You’ve been counting false positives In the previous exercise, you counted the number of people who falsely got a positive result. These cases are called false positives. doesn’t have has lizard flu lizard flu false false doesn’t has it positive positive have it The opposite of a false positive is a true negative In addition to keeping tabs on false positives, you’ve also been thinking about true negatives. True negatives are situations where people who don’t have False positive rate True negative rate the disease get a negative test result. If someone doesn’t have lizard flu, the If someone doesn’t have lizard flu, the probability probability that the test returns positive that the test returns negative for it is 91%. for it is 9 percent. What term do you think describes this statement, and what do you think is the opposite of this statement? If someone has lizard flu, the probability that the test returns positive for it is 90 percent. you are here 4 175 And here’s Scenario 2, where few people have the disease. 9% of people who don’t have it is just one false positive. If you don’t have lizard flu, the test result is either false positive or true negative.This is the probability of a positive test result, given that the person doesn’t have lizard flu. This represents the false negative. positive test result meet conditional probability What term would you use to describe the other part of the lizard flu diagnostic test? This is the true This is the false Lizard flu diagnostic test positive rate. negative rate. Accuracy analysis If someone has lizard flu, the probability If someone has lizard flu, the probability that the test returns positive for it is 90 that he tests negative for it is 10%. percent. All these terms describe conditional probabilities A conditional probability in the probability of some event, given that some other event has happened. Assuming that someone tests positive, what are the chances that he has lizard flu? Here’s how the statements you’ve been using look in conditional probability notation: P(+L) = 1 - P(-L) P(+L) = 1 - P(-L) The tilde symbol means that the statement (L) is not true. Conditional Probability Notation Up Close Let’s take a look at what each symbol means in this statement: The probability of lizard flu given a positive test result. P(L+) 176 Chapter 6 This represents This represents the true positive. the true negative. This represents the false positive. This is the probability of a positive test result, given lizard flu. probability lizard flu givenWhat is the probability of lizard flu, given a positive test result? bayesian statistics false positives, true positives, You need to count false negatives, and true negatives Figuring out your probability of having lizard flu is all about knowing how many actual people are represented by these figures. P(+L), the probability at someone tests positive, given that they don’t have lizard flu P(+L), , the probability at someone tests positive, given that they do have lizard flu P(-L), the probability at someone tests negative, given that they do have lizard flu P(-L), the probability at someone tests negative given that they don’t have lizard flu. Yeah, I get it. So But first you need to know how many people have how many people lizard flu. Then you can use these percentages to have lizard flu? calculate how many people actually fall into these categories. P(L+) you are here 4 177 How many actual people fit into each of these probability groupings? This is the figure you wantbase rate follies 1 percent of people have lizard flu Here’s the number you need in order to interpret your test. Turns out that 1 percent of the population has lizard flu. In human terms, that’s quite a lot of people. But as a percentage of the overall population, it’s a pretty small number. One percent is the base rate. Prior to learning anything new about individuals because of the test, you know that only 1 percent of the population has lizard flu. That’s why base rates are also called prior probabilities. Watch out for the base rate fallacy I just thought that the 90% true positive rate meant it’s really likely that you have it That’s a fallacy Always be on the lookout for base rates. You might not have base rate data in every case, but if you do have a base rate and don’t use it, you’ll fall victim to the base rate fallacy, where you ignore your prior data and make the wrong decisions because of it. In this case, your judgment about the probability that you have lizard flu depends entirely on the base rate, and because the base rate turns out to be 1 percent of people having lizard flu, that 90 percent true positive rate on the test doesn’t seem nearly so insightful. 178 Chapter 6 Center for Disease Tracking is on top of lizard flu Study finds that 1 percent of national population has lizard flu The most recent data, which is current as of last week, indicates that 1 percent of the national population is infected with lizard flu. Although lizard flu is rarely fatal, these individuals need to be quarantined to prevent others from becoming infected.bayesian statistics v Calculate the probability that you have lizard flu. Assuming you start with 1,000 people, fill in the blanks, dividing them into groups according to your base rates and the specs of the test. Lizard flu diagnostic test Accuracy analysis If someone has lizard flu, the probability that the test returns positive for it is 90 percent. If someone doesn’t have lizard flu, the probability that the test returns positive for it is 9 percent. 1,000 people The number of The number of people people who have it who don’t have it The number who The number who The number who The number who test positive test negative test positive test negative of people who have it and test negative The probability that you have it, given that = = ( of people who have it and test negative) + you tested negative ( of people who don’t have it and test negative) you are here 4 179 Remember, 1% of people have lizard flu.91% of people who’ve tested positively don’t have lizard flu. things don’t look so bad What did you calculate your new probability of having lizard flu to be? Lizard flu diagnostic test Accuracy analysis If someone has lizard flu, the probability that the test returns positive for it is 90 percent. If someone doesn’t have lizard flu, the probability that the test returns positive for it is 9 percent. 1,000 people 990 10 The number of The number of people people who have it who don’t have it 9 1 89 901 The number who The number who The number who The number who test positive test negative test positive test negative of people who have it and test negative The probability that 9 = 0.09 you have it, given that = = ( of people who have it and test negative) + 9+89 you tested negative ( of people who don’t have it and test negative) There’s a 9% chance that I have lizard flu 180 Chapter 6 9% of people who’ve tested positively have lizard flu.Here are the false positives. Here are 1,000 people. These are all the true negatives. bayesian statistics Your chances of having lizard flu are still pretty low Here’s the false negative. you are here 4 181 These people don’t have it. You’re either a true positive or a false positive, and it’s a lot more likely that you’re a false positive. Here are the true positives.But probabilities aren’t quite as salient as whole numbers. The true positive rate keep it simple Do complex probabilistic thinking with simple whole numbers When you imagined that you were looking at 1,000 people, you switched from decimal probabilities to whole numbers. Because our brains aren’t innately well-equipped to process numerical probabilities, converting probabilities to whole numbers and then thinking through them is a very effective way to avoid making mistakes. Bayes’ rule manages your base rates when you get new data Believe it or not, you just did a commonsense implementation of Bayes’ rule, an incredibly powerful statistical formula that enables you to use base rates along with your conditional probabilities to estimate new conditional probabilities. If you wanted to make the same calculation algebraically, you could use this monster of a formula: P(L)P(+L) P(L+) = P(L)P(+L) + P(-)P(+L) 182 Chapter 6 You’ve got tools in here for dealing with whole numbers. This formula will give you the same result you just calculated. The probability of lizard flu given a positive test result The base rate (people who don’t have the disease) The false positive rate The base rate (people who have the disease)bayesian statistics You can use Bayes’ rule over and over Bayes’ rule is an important tool in data analysis, because it provides a precise way of incorporating new Bayes’ rule lets you information into your analyses. add more information over time. My Analysis My Analysis Base Test Base rate + results rate My Analysis More Base Test test rate + results + results So the test isn’t that accurate. You’re still nine times more likely to have lizard flu than other people. Shouldn’t you get another test? The base rate: 1% Your doctor took the suggestion and ordered another test. Let’s see what it said… you are here 4 183 Yep, you’re 9x more likely to have lizard flu than the regular population.a new test result Your second test result is negative The doctor didn’t order you the more powerful, advanced lizard flu test the first time because it’s kind of expensive, but now that you tested positively on the first (cheaper, less accurate) test, it’s time to bring out the big guns… That’s a relief You got these probabilities wrong before. Better run the numbers again. By now, you know that responding to the test result (or even the test accuracy statistics) without looking at base rates is a recipe for confusion. 184 Chapter 6 ADVANCED Lizard flu test results Date: Today Name: Head First Data Analyst Diagnosis: Negative Here’s some information on lizard flu: Lizard flu is a tropical disease first observed among lizard researchers in The doctor ordered a slightly South America. different test: the “advanced” The d sease s h gh y con ag ous and i i i l t i , lizard flu diagnostic test. a ec ed pa en s need o be quaran ned ff t ti t t ti n he r homes or no ewer han s x i t i f f t i weeks. Patients diagnosed with lizard flu have been known to “taste the air” and in extreme cases have developed temporary chromatophores and zygodactylous feet. NEGATIVEThis new test is more expensive but more powerful. bayesian statistics The new test has different accuracy statistics Using your base rate, you can use the new test’s statistics to calculate the new probability that you have lizard flu. Lizard flu diagnostic test Accuracy analysis Advanced If someone has lizard flu, the probability that the test returns positive for it is 90 Lizard flu diagnostic test percent. Accuracy analysis If someone doesn’t have lizard flu, the If someone has lizard flu, the probability that probability that the test returns positive the test returns positive for it is 99 percent. for it is 9 percent. If someone doesn’t have lizard flu, the probability that the test returns positive for it is 1 percent. Should we use the same base rate as before? You tested positive. It seems like that should count for something. What do you think the base rate should be? you are here 4 185 This is the first test you took. These accuracy figures are a lot stronger.Old base rate …now you’re part of this group. base rate revision What do you think the base rate should be? 1% can’t be the base rate. The new base rate is the 9% we just calculated, because that figure is my own probability of having the disease. New information can 1% of everybody change your base rate has lizard flu When you got your first test results back, you used as your base rate the incidence in the population of everybody for lizard flu. But you learned from the test that your probability of having lizard flu is higher than the base rate. That probability is your new base rate, because now you’re Everybody part of the group of people who’ve tested positively. 9% of people who tested Let’s hurry up positively have lizard flu and run Bayes’ rule again… People who’ve tested positively once 186 Chapter 6 You used to be part of this group… Just a regular person… nothing remarkable Your new base ratebayesian statistics Using the new test and your revised base rate, let’s calculate the probability that you have lizard flu given your results. Advanced Lizard flu diagnostic test Accuracy analysis If someone has lizard flu, the probability that the test returns positive for it is 99 percent. If someone doesn’t have lizard flu, the probability that the test returns positive for it is 1 percent. 1,000 people The number of The number of people people who have it who don’t have it The number who The number who The number who The number who test positive test negative test positive test negative of people who have it and test negative The probability that you have it, given that = = ( of people who have it and test negative) + you tested negative ( of people who don’t have it and test negative) you are here 4 187 Remember, 9% of people like you will have lizard flu.91% of people who’ve tested positively don’t have lizard flu. bayes saves the day What do you calculate your new probability of having lizard flu to be? Advanced Lizard flu diagnostic test Accuracy analysis If someone has lizard flu, the probability that the test returns positive for it is 99 percent. If someone doesn’t have lizard flu, the probability that the test returns positive for it is 1 percent. 1,000 people 910 90 The number of The number of people people who have it who don’t have it 89 1 9 901 The number who The number who The number who The number who test positive test negative test positive test negative of people who have it and test negative The probability that 1 = 0.001 you have it, given that = = ( of people who have it and test negative) + 1+901 you tested negative ( of people who don’t have it and test negative) There’s a 0.1% chance that I have lizard flu 188 Chapter 6 9% of people who’ve tested positively have lizard flu.