---start biostat 1.15.97-- Handout #2 was distributed "A first experiment: the null and alternative hypothesis" so called pizza experiment which our instructor once actually conducted: people in the building where he lived came up to him and said "i have a problem you might be able to help with. my roomate and i keep tossing a coin to see who's going to get the pizza. it's cold in state college in the winter. the place takes forever to deliver. and every time we flip the coin, I lose!" our instructor decided to test the coin. assume the coin is fair. if you propose the coin is fair, there's always a 50-50 chance of heads or tails. so they tossed the coin 12 times. if you get heads 12 times out of 12, you might be suspicious...but this doesn't prove the coin isn't fair. could be random, though small chance (0.5^12 aka 1/4100 that that would happen). so this would be fairly strong evidence that the coin is not fair. what if the coin was heads 10 out of 12 times? 9? 8? if you toss 6 -6, it's probably fair. when does it become unfair? why are clinical trials using these methodologies for testing hypotheses? Paired puppy expt. proposed kennel cough vaccine should prevent KC in puppies. so experiment was designed as laid out in handout 2. usually in clinical trials, meds are tested against a placebo. a placebo is hopefully not discernible from actual drug by sight, taste, smell, etc, but does not have any biological properties. placebo effect when testing malate as muscle relaxant in 1960s they got 400 folks w/back pain, gave real stuff to 200, placebo to 200. we expect there to be some placebo effect, so you subtract off the effect of the real drug the effect of the placebo, you get the "real" effect. mash unit in korea gave out salt pills when they ran out of oral morphine - it worked very well, actually. so back to puppies. we select a single pair of puppies from each of a bunch of litters. see, if we're trying to test something, we want the basis of each evaluation to be uncontaminated, genetically or otherwise - so only one puppy pair from each litter, so as not to have environmental or genetic contamination. we want each comparison of drug to placebo to be INDEPENDENT. if they come from same litter, they are not independent. so we pick the puppy pairs out if vaccine is successful and placebo isn't, outcome = + if placebo is successful and vaccine isn't, outcome = - TIES ARE NOT COUNTED vaccine placebo outcome (+/-) pair 1 + or - pair 2 pair 3 . . . pair n we can't use ties in the probability process. if both puppies of a pair get kennel cough or do not get kc they don't count. so any puppy pair with no difference between the results isn't used in probability calculations. regardless of # of pairs, what would prove that the vaccine is no better or worse than placebo? equal number of + and - signs in that right hand column. but, if vax is working, should be much larger # of + than -. so, ok, if in fact the drug is no better or worse than its vehicle placebo, is ther not a chance that even IF it is useless, from time to time when doing experiments we might see, by chance, a large # of + even though vax doesn't work? YES, as with pizza coin above. Also, if it DOES work, we could find in a single of experiment the - win out. see, you can take an unfair coin and still get a seemingly fair result! it is *possible* just not probable. if you weight a coin to throw heads 80% of the time, it COULD throw 8 tails in a row. this puppy pair is a microcosm of an actual clinical trial. see pg 2. - "tests of significance." just like in pizza exp't, we need a probability that something is likely or not likely, because we need to make a decision about whether our results were real or the product of chance. [side note: what if during the first clinical trial of penicillin, the first 5 subjects all had a bad allergy to penicillin??][question: wouldn't they have had to have had a primary exposure??] so. puppies. you have n=3 matched pairs, no ties. outcome ways joint prob of sequence p (outcome) 3 - 1 1/27 1/27 2 -, 1 + 3 2/27 6/27 1 -, 2 + 3 4/27 12/27 3 + 1 8/27 8/27 total 8 27/27 so ¼+ (actual percentage, or, PARAMETER, of cases in which vaccine was better than placebo) = 2/3 p+ = experimental estimate of ¼+ you can never really know ¼+, only p+ is observable. so, given ¼+ = 2/3, and given 3 untied puppy pairs, then the third row in the chart above is the most likely outcome and most probable - 12/27=p. 3 losses is not likely. you can project the probability model in advance of doingthe experiment, and then you use the experimental results as proof one way or the other. ok, well, what if the real value of the vaccine isn't ¼+ = 2/3? what if it's 1/5 or 3/4 or something? then this model is useless. it's incorrect. this probability assessment only works if your projection is correct. so in the early part of the century some ideas came together - Neyman, Pearson, and Fisher (not together) ....produced procedures for formulating hypotheses and testing their likelihood under uncertainty...the Null and Alternative hypotheses. these guys said look, it makes no sense to propose a specific hypothesis, because you could be wrong, you could be very inaccurate, and you can't afford mistakes. so what they said is that in order to make this whole thing work, we should propose at the outset, that the drug is useless. if you propose at the beginning of an experiment that that being tested has no value over placebo or control, than you can always project a probability distribution and be right. once you make the supposition that the drug has no value above or below the placebo, or that procedure a differs in no way from procedure b, etc etc, you write down that probability projection. then you do the experiment and gather evidence which either supports or destroys the null hypothesis. from sunday inquirer august 8 1976 - life on mars, how the data streamed back - the viking mission. in the paper's discussion of this "unusual data continued to come home from mars this week, and while none of the data showed life, yada yada...scientists used a traditional conservative scientific process known as the null hypothesis to disprove the existence of life on mars, so faced with data....yadayada the inquirer provided this definition: null hypothesis: supposition that what you are trying to prove is not so; to prove you are right you must fail to prove the opposite tee hee. PBS series: the ascent of man. Jacob Ranowski. Gallileo described as founder of modern scientific method, who paid a terrible price for that finding, because the times in which he tested the work of copernicus re: the universe, did not look kindly on him. doctrine at that time, during the inquisition, was that you did what you were told and nothing else; while gallileo thought if he showed the truth he would be believed, but he was naive believing that, he didn't consider politics, and he ended up under house arrest for 22 years as a heretic or something. what did G do? he was a scientist in Padua, Italy, and not unlike today they had a collection of scientists, and G was the local math/physics dude who was on a yearly stipend. he got a telescope from holland and showed some folks that he could tell which ships were coming in ok 4-5 hrs in advance, and made bucks off that. but he also, somehow, got hold of the work of copernicus who had published in poland a document about how the solar system worked, with the sun in the center and the planets going around in a circle around the sun. politically, the current thinking was that aristotle and ptolemy were correct - that universe was fixed, earth was at center, end of discussion, thank you for flying greek philosophy airlines. well, G proposed that copernicus was in fact correct, and he was chastised by the vatican for going against current teaching. Ranowski attributes to G the finding of the null hypothesis/sci method. G asked his friend the bishop if instead of trying to prove copernicus right, if the church would mind if he, G, supposed that aristotle and ptolemy were right, and gathered evidence to prove it...and they didn't mind. so he set about doing this, gathering evidence about the aristotelian theory (because he could not hold or defend or try to show that works of copernicus were correct.) - so he stated the null hypothesis that aristotle and ptolemy were correct - earth was at center. he built a telescope (stepped his toy one up to 30 power), and gathered data. he saw jupiter and he saw 4 moons around it!! moving around it! he published his results in "the starry messenger". hey, not HIS fault that while he was trying to prove the church's stupid theory that the evidence disproved it, right ?? *grin* Church guy trying to prevent medici from looking through Galileo's telescope- this is in lithograph. G didn't get money from Medici's, but he did prove Aristotle/Ptolemy wrong, by gathering evidence AGAINST The null hypothesis. some years later he discussed at great length why he ws right...and that got him in even more trouble. bottom line: to prove drug works, propose that it does NOT work and prove otherwise. write down probability distribution as if it does NOT work, do probability report, do experiment - does evidence support null, or show that it's probably not true? ok, now lets say we have 6 untied puppy pairs. see p 3 handout, histogram. the X+ indicates number of times vaccine won. could be 0 through 6 wins for the vaccine. the question is how to judge what is likely. the null would be that Null: ¼+ = ¼- = 1/2 so with 6 untied results you would most likely see 3 wins. now look at binomial equation where p+ + p- = 1 p(x +signs/N pairs)= nPx,(n-x)p(+)^x p(-)^(n-x) see handout for math, i can't write these equations :) if we got six wins for the vaccine,that would be good - but that COULD have happened even with useless vaccine, just like a fair coin can turn six heads. but there's a very low probability of that happening. 6 wins is strong evidence against the null hypothesis. now, 3 wins is a GOOD piece of evidence for null hypothesis. now we need a decision rule. you're supposed to set up what will happen based on results before you do the experiment, but this isn't always done or even usually done, for good reasons to be discussed later. One sided rule: let's say that if 5 or 6 + occur, eg if X=5 or 6, we will reject Ho in favor of alternative Hi where Hi: ¼+ > ¼- or ¼+ > 1/2 so if there are five or six wins for the vaccine, we reject the null. now, the probability of x being 5 or more if vaccine has no value, is .1094 or about 11% which isn't a high probability. scientific data are always published WITH the probability that they would be wrong - known as a p level. so here, we'd say we think the vaccine is useful, but we could be wrong, an 11% chance that we're wrong. now, usually, published data have only a 5% or less chance of being wrong, but that is by convention. p=.05 however is what FDA, NIH, etc generally consider significant. but is 1/20 that much better than 1/19? probably not, really. now you could argue that you this vaccine works academically, and that's fine, but regulatory bodies may well say fine but you have to prove .05 so... the REJECTION region: aka alpha level in this given exp't the rejection level is about 11% ==> .1094 . this is the level at which you would reject the null.. the p level is tied to the rejection or alpha level and is the probability of a specific expt'l outcome p level for 6 wins for vaccine = 0.0156 p level for 5 OR 6 wins would = 0.1094 aceptance region is defined as as extreme a value as we see from chance or more extreme. X < or equal to 4 wins. alpha level: that area under the null which would lead us to reject the null p level: i'm not exactly sure. what ties these things together is the sample size needed given what's of interest to find. how do you pick the sample size? a statistician will help you pick the probabilities, but to give the sample size attached to those probabilities, we need to tell them what's worth finding? you can't say "anything's worth finding" - you can't DO that. ---end---