---start---- epi 2/16/98 galligan two handouts: test information on heart rate and survival of colic surgery, and practice problems (the one that says "disease Q" on top) the homework assignment was: a cow is presented with a positive test from a herd with an underlying conception rate of 40%. what's her probability of being pregnant? before going over that, he's showing us a trick. neat o. ok, so the sample question related to the Kamar test, which had 85% sensitivity and 95% specificity in a herd with 50% pregancy rate. basic test info: preg open + 85(sn) 5 - 15 95(sp) total 100 100 now, these numbers are based on a 50% pregnancy rate. we have only a 40% prevalence of pregnancy, so we multiply everything by .40 we end up with: preg open + 34 3 37 - 6 57 63 40 60 this is the joint probability table. PPV = 34/37 = 92% so that's the answer. the cow has a 92% chance of being pregnant when she tests positive and she comes from a herd with 40% prevalence of pregnancy. now, as we use a test, the predictive value will change as prevalence changes. as prevalence increases, predictive value increases. the art of diagnosing is constantly moving animals into different prevalence zones. stuff we're talking about today: YA example with a slightly different twist on it. this will show us a different method of looking at test information. Test information on HR and survival of colic surgery: many times, you deal with measurements that are continuous in nature - you look at a degree of HR as opposed to a positive or negative. so, Dr Jim Orsini collected some data on 125 horses presented for colic surgery. during PE he took their HR in bpm. surgery was then performed, and the survival of animals was followed. the study asked what is the relationship of preoperative HR to postsurgical survival? data (number of horses) preop HR # lived #died total <50 82 6 88 >50 20 17 37 total 102 23 125 what can we calculatet from this table? well, sensitivity and specificity of the "HR test". 82/102 = sensitivity = about 82; 17/23 = specificity. also we can calculate the prevalence of live and death, etc. so this is a joint probability table. we started with 125 horses and let them migrate into the various cells. because we collected the data this way, we have the underlying prevalence included. data (frequency of horses) take everything in above table and divide all numbers by 125 .65 .05 .70 .16 .14 .30 .81 .19 125 sensitivity = 82/102 = .65/.81 = 80% specificity 17/123 = 74% prevalence of living = 102/125 = .81 probability of horse having HR > 50 and surviving? HR > 50 20/125 lived, 17/125 died, 37 total 20/37 = 54% chance of living (or, .16/.30 = 54%) ok. Cornell study: they did a study, a bit different design. they looked at animals in terms of HR with more balanced population of survivors and deaths. preop HR lived died total HR <50 41 14 55 HR >50 10 38 48 51 52 103 ** sensitivity and specificity are independent of underlying prevalence. so, they'll be the same for a test, regardless of prevalence. but predictive values vary with prevalance. prevalence: 51/103 = 50% sensitivity: 41/51 = 80% specificity: 38/52 = 73% cows: assume 5% chance that any cow in a group of cows is in heat on a given day. if you see a cow showing a sign of estrus, then the chance of the cow being in heat is greater. when you palpate that cow, you are trying to figure out if the cow is in a group with a higher prevalence of estrus. so this is how diagnostics work. two cows showing identical signs of heat. one was injected on Friday with prostaglandin. today is monday. do they have the same probability of being in heat? test results are the same, but they are in different prevalance groups- one animal should have near 5%, the other near 70%. that's the whole trick. you artificially inflate underlying prevalence by applying test of heat detection. so predictive values are higher. multiple levels: well, >50 and <50 are broad categories, right? so we should break this down. HR lived died < 30 50 3 31-50 32 3 51-70 10 9 71-90 8 5 91-110 2 3 102 23 now we can vary the cutoff point and calculate sensitivity and specificity for each point (see bottom left p 2.) These are different at each level. what happens when you choose a cut off point? you're changing the number of animals you misclassify. you change the number of animals predicted to die that end up living, and those that are predicted to live and it ends up dying. predicting life and ending up with death makes you look stupid. you might want to minimize risk of that happening by manipulating the cutoff level. manufacturers will report specificity and sensitivity to minimize type I and type II error at condition of 50% prevalence. they can do this by calculating an ROC curve: plot sensitivity vs 1-specificity - see bottom of p 2 of handout, right side. the points on the curve are the heart rates.the point closest to the upper left margin will minimize frequency of type I and II error at 50% prevalence. note that frequency of error IS dependent on prevalence. ROC = reciever operator curves. this came from SONAR graphs - what are the consequences of claiming there's a ship there when there isn't, vs claiming there's no ship, when there is. heh. they use this methodology in selecting products to use in dairy herds. type i error - you detect a difference when there is no difference -t his is alpha error, you keep it beneath 5% usually. Beta - failing to detect a difference that is really there. basically you're looking for the point of inflection of curve, or the point that's closest to the upper left hand corner. this is where the tradeoffs of misclassification are minimized. it's a convenient way to ID a cutoff point based on a constant issue of maintaining a minimum of the two types of error.the problem you run into is the different costs associated with some errors. likelihood ratio method: this is a method of accounting for the metric of the test, the degree of the measure, how high is the heart rate. this method does not lump. it accounts for the underlying prevalence of the condition of interest which varies from herd to herd. this works as follows: look at the conditional probability table in the handout: HR lived lived prob died died prob. the sum of the lived probs is one, the sum of the died probs is also one. these are conditional probabilities - given that the animal has a certain condition,what's the probability of it having a HR in this range? then you take a ratio of lived prob/died prob, and that is called a likelihood ratio (see table in handout). so this table says if HR is < 30, animal is 3.77 x more likely to lie than to die. so you can use this info to come up with a predictive value. example: horse HR = 80. what is probability of living? prevalence of living is 82% (if you don't know HR -> 82% survival is our overall rate) HR = 80 ->how does this change the prevalence? what's the prevalence of living in horses with HR of 80? likelihood rato for HR 71-90 is 0.36 pre-test odds = underlying prevalence divided by 1-underlying prev: 0.82/(1-.82) = 4.43 pretest odds * likelihood rato = post test odds -> this means that taking the test changes the odds. 4.43 x 0.36 = 1.6 post test odds post test probability of living: 1.6/(1+1.6) = 61% -> so animal has 61% chance of living. so by taking this system we can adjust for underlying prevalence of condition (using pretest odds), account for metric of test using likelihood ratio, and combine them together using post test odds, to calculate a predictive value/post test probability of living. ---break---- a question came up during break of where do you get the prevalence of living in the last example? that's based on history. our overall survival rate for all animals presenting with colic is 82%. then, we'll do a test, we'll take the HR. now we look at our ddata. HR 80 has a likelihood ratio of 0.36 pretest odds are the probability of living divided by probability of dying - 0.82/0.18 and that's 4.43 you multiply that by the likelihood rato to get the post test odds, which calculate out to 1.6 now we take the post test odds,and calculate the probability of living by taking post test odds/(1+post test odds). why is that the formula? his explanation makes no sense to me. he says though that given a HR of 80, the animal has a probability of living of 61%. - now he is telling us not to take notes on this section he's talking about now - interpretation of data on leptospira in dairy cattle and abortion. leptospiral antibodies in sera of 305 aborting cows for any given lepto titer, what's the infection status? as the titer gets higher, the number of infected cows is larger than the number of uninfected cows. you can make an ROC curve out of this data. then we take the point closest to the upper left corner to make a cutoff point. calculate the PPV using the formula: sensitivity * prevalence / [(sens*prev) + (1-spec)(1-prev) we end up with 19% chance of abortion due to lepto. this is the same as the two by two table method. likelihood rato approach - keep things on a continuous basis - we calculate percents of infected and uninfected for each titer level, and calculate likelihood ratios. now we can calculate pretest odds: .05/1-.05 = 0.053 then we do a titer - if that's 300, our LL is 1.3, so post test odds are 0.53x1.3 = 0.068 post test probability is 0.068/1+0.068 = .064 so that's about a 6% chance instead of 19% chance. yeah, whatever. point: tests are only good if they have high degree of accuracy, and that depends on the threshold we choose. at what level of somatic cell count you consider a cow to have mastitis is another thing that woudl vary herd to herd depending on underlying prevalence. - we started off with sensitivity and specificity - used that in context of decision tree, where frequency of + and - and predictive values were important (not sens and spec) - then we got into the likelihood ratio area, where we keep tabs on a spectrum of tests, without lumping it all into two categories, improving ability to estimate probability of animal having condition of interest. handout w/two examples: work on them. moving on. Survival analysis: basic epidemiology of reproduction this method is used with increasing frequency in reproduction. to get some background, repro is important in dairy b/c of the curvilinear milk production curve - cows make a lot of milk and money early on, and then in late lactation htey don't make so much. we want them to maximize time spent in early lactation - this is why we rebreed them ASAP. we want her to spend as much time as possible in early lactation. pitfalls of parameters - danger of averages - days open, days to first breeding. consider we have an event (pregnancy) and we measure the time to that event. it's like considering time to death as 85 years for someone, right? also - we measure percentages of animals that are pregnant. but if you make the study long enough, everyone gets pregnant. if you make it long enough they all die, actually. you need to account for the time element as well as the number of animals having the event. distribution of days open - not a bell shaped curve, highly skewed. an average isn't going to measure the central tendency - but people use that average as an index anyway. we need a better system. other measures: p 5 bottom left. top line - we don't know if cow is pregnant right now. we know she was open before she was bred. now she's "censored" second line - is open post breeding third line - this cow is pregnant. was open til date of breeding. fourth line - calved, then wasn't bred. F: calved, bred, open, culled G: calved, bred, pregnant, culled we want reproductive records so we can see what's going on in the herd. event time curves: kaplan meir calculation this is based on arranging all animls by time to event or to censoring. say you have 100 cows. DIM (days in milk) # at risk pregnant culled open cum open 0 100 0 5 20 95 0 1 40 94 0 1 60 93 33 0 65 65 80 60 21 0 65 42 100 39 14 0 number at risk is those you start with minus those that are culled and minus those that have the event of interest. you could do the same thing for living or dying or whatever. event time curves, cont. now we calculate a pregnancy rate - a function of # that got pregnant divided by those at risk. so 33/93 = 35% pregnancy rate at day 60. this is called the instantaneous pregnancy rate. also, for each level of DIM we have a probability of being open. this is 100 -the preg rate- so 100 - 35 = 65. then we calculate a cumulative probability of being open. tha'ts the probability that it was open before times the current probability of being open - eg, 0.65 * 0.65 = 42% now, if you plot the cumulative probability of being open - see bottom right p 6. at some point, percent open is no longer 100%. if repro program is perfect, it will drop right off. the slope of the line is the pregnancy rate. we're plotting percent open vs time. endometritis study: cows were injected w/PCN or oxytet. starting with a whole lot of cows, he injected them with one or the other and tracked how long it took to get them pregnant. see p 7. the controls had endometritis but weren't treated, and the unaffecteds didn't have endometritis. the unaffected cows got pregnant readily. the control cows and the treated cows had similar pregnancy rates. so using penicillin didn't affect reproductive efficiency. for oxytet, all three curves were similar. (??) p 8 herds that went on a PG program. the lines show pretreatment repro efficiency and the posttreatment (white squares). after treatment the curves come down more. they are more efficient. also, it's more incremental - there are these steps in the curves - more animals get pregnant on the same day. often the second step is bigger. then the third and fourth steps are cows that didn't take on their first breeding and got pregnant at their second breedings. you'll see this more and more in repro in some other courses. there are also other ways of doing this but this one is the easiest - the kaplan meir. ---end---