----start epidem.lec.01.07.98----- epidemiology 1/7/98 Gary Smith 1-2 He's discussing Havelock Ellis' opinion of dancing. Also the development of ballet, en pointe dancing, introduction of women into ballet dancing and the introduction of pink tights and short skirts. Tolstoy thought ballet was just a nude display. Checkov didn't understand the ballet, but thought the dancers smelled bad. Later, a congressional session was cancelled because too many congressmen wanted to go to the ballet. Classical dance is not alone in shocking people. The waltz in triple time also shocked people. The idea of grasping a woman's waist or pressing the hand to the small of her back was intense. Byron talked about dancing too. The effect dancing has had on people is fascinating. Almost as intense as the effect of risk factors on the incidence of disease, which we're going to discuss today. Meausures of Disease Occurence: Prevalence, Incidence, and Cumulative Incidence Epidemiology is study of disease occurence...this is how we measure it. Prevalence: the proportion of the population that has the disease at one specific point in time. Assume everyone is here - 110 students. How many have colds? 4? say there are 100 people here. The prevalence is 4%. How many have had a cold over the past week and are better now? more. we'll get back to that. Prevalence can be expressed as a number between 0 and 1 or as a percent. This is also sometimes called "point prevalence." Our point prevalence was easy to calculate since we were all here in the room. Sometimes you have to sample/survey a population over weeks to get a good sample size. When you spend all this time measuring prevalence, it's different. 8 of us have had colds in the past week, so that would go in with the four and give us 12%, because we've altered the prevalence by taking a long time to measure it. This is called "period prevalence" because you measure the prevalence over a period of time. prevalence is a proportion. it has a numerator and denominator. So, it has a value between zero and one. Often, you see numbers purporting to be estimates of prevalence, that are not prevalence - eg, someone might count cases, and say there is a prevalence of four. That isn't a prevalence, that's just the number of cases. There are problems with that. slide: number of cases of lymes dz in humans in pa from 1986-92. it shows increasing numbers of cases through 91, then a small drop in 92 back to around the level of 1990. slide: number of cases of lymes in humans in pa in 1990, showing a bell like curve with a peak in June. that doesn't mean the prevalence is necessarily increasing, though....PA gets a lot of tourists in the summer, so there is also an increased population here then. Chester county is risky - lots of lymes there. Fastest growing county in the state - 85,000 new homes being built in one school district alone. Lots of transitional tick habitat. We're not really seeing an increase in the force of infection, but we're seeing more people in the area of incidence. So, if you're reporting case numbers, you have to know if the population is remaining constant or not. slide: trichinosis in PA 1980-1990 - shows a big decrease in total number of cases over this time. But, it doesn't mention the population at risk of disease - is it remaining steady or is it also declining? slide: brucellosis cases in the US - number of quarantined herds from 1989-1991. this shows a decline from 2000 - 800 herds needing to be quarantined over this time. note that here, the term "cases" refers not to an animal but to a herd. it's quite reasonable to report the number of herds that have even one incidence of the disease - this would be called "herd prevalence" as opposed to "animal prevalence." Now, in fact, the actual number of herds is also decreasing. Individual herds are getting bigger, total number of herds is smaller. so maybe this slide isn't really as good as it looks. slide: map of PA showing cases of rabies in skunks and raccoons in 1992. in eastern pa, 64 rabid raccoons and 23 skunks, and in western 18 raccoons and 3 skunks. It looks like there is more rabies in the east than the west - but we're not accounting for raccoon and skunk density. We don't know how that varies over the state. it should be obvious that prevalence has advantages over case counts. Only use counting cases when you know the at risk population is constant. because prevalence is a proportion, it can be used comparatively over time, in different places at the same time, or to compare populations maintained under different conditions. Comparing Disease Occurrence: remember in the US, the USDA used the decline in # of quarantined cattle herds over three years as proof of success of brucellosis eradication program. The australians used another measure of prevalence. they used vaccination, test/removal, and depopulation for their program. Their herd prevalence measurement was the proportion of herds where at least one case of disease occured. also the proportion of cattle nationally that had the disease. herd prevalence as well as individual prevalence dropped steeply between 1976 and 1983. The didn't just count cases, they actually measured prevalence. this sounds straightforward, but it is important to realize prevalence isn't that straightforward a measure of disease occurence. it depends on two things *write this down** 1. the rate of onset of the disease, and 2. the duration of the disease once you have it. in other words, prevalence reflects not only the rate at which new cases occur, but the rate at which those cases recover. we're really after the rate of new occurrences. epidemiologists always want to know that, but don't usually care about the rate of recovery. prevalence is therefore contaminated by a measure of the recovery rate. lots of things affect recovery rate, too. so that may make prevalence seem to vary - well nourished animals may recover faster than undernourished animals, while the rate of new occurrence is actually the same - and then prevalence will be different. suppose you think of some long lasting disease - you get it and you have it for a long time - herpesvirus for example. in that case, if you take people who are middle aged and measure the prevalence of infection, it will be high - because the cases have had time to accumulate, and they aren't being gotten rid of. if you imagine the same rate of onset with a one week recovery, the prevalence would be very low in the same age group. in fact, comparing populations on basis of prevalence is fraught with difficulty - two major problems. one applies to incidence and to cumulative incidence and has to do with how we define a case - what do we accept as a case? we can't always definitively recognize a case absolutely without error. not all individuals show all the signs of a disease. Say the PA lyme dz example - not all cases have the erythema migrans bullseye rash - less than 50% of the people with lymes have the rash. So you might be looking for the wrong things. slide: case series of lymes cases in PA showing frequencies of different signs - rash more common in young people, neuro signs and cardiac signs pretty constant, arthritis more common in older people. so, sometimes you misdiagnose someone as healthy or sick when they aren't. these mistakes almost never cancel eachother out. You can't say the number you misdiagnose as diseased cancels out those you misdiagnose as healthy - so your measured prevalence is often totally wrong. Now, if you have prior knowledge of sensitivity and specificity of your criteria, you may be able to correct this error. But know that measuring prevalence isn't as easy as it sounds. It can involve a lot of understanding about diagnostic test efficacy and so forth. The other problem with prevalence arises when we measure prevalence in a subset of the population...no one in the first two rows has a cold right now. prevalence = 0. This is a nonrepresentative sample. The true prevalence is 4%. problems with prevalence. sexually transmitted dz among colony baboons - herpesvirus. many tests were used to diagnose so misdiagnosis was not really an issue. 1. 62/682 infected P = 0.09 2. 35/242 females P = 0.14 27/440 males P = 0.06 3. age in yrs 0-3 3-6 6-9 9+ prevalence 0.03 0.17 0.05 0.08 obviously, it's important to choose a representative subset! just measuring females or just measuring males would be wrong. consider trichomoniasis in bulls - cows recover quickly, bulls remain infected for a long time. therefore prevalence increases with age of bulls. prevalence of strongyloides felis in cats increases with body weight increases in cats. that's because weight is a surrogate for age. FIV prevalence in Canada also increases with age. If you wish to get a good measure of prevalence you need to make sure you measure in the population in which you are interested. If you are interested in comparing prevalence in young cats vs old cats, it's going to be different, but it doesn't mean much. those who read the journals will see lots of ads from drug companies saying nematodes in dogs are very frequent and you should use their product to get rid of them. those estimates of prevalence were taken from pound dogs. are they representative of the dog population at large? no! we want to know the prevalence of infection in well cared for dogs. new study - prevalence of t.vulpis, ancylostoma, and toxocara were MUCH higher in shelter dogs than in clinics in De/MD. in philadelphia/NJ the toxocara rates were identical in well cared for dogs and shelter dogs, otherwise the rates were like the other study. why? well, the prevalence of toxocara canis is much higher in younger dogs than older dogs. in the DE/MD shelters, there may be more young dogs in shelters, and this may be why the toxocara canis number is higher. so it was easy to measure prevalence, but it is not so easy to interpret it. There are two techniques for dealing with prevalence. One is to record age/gender/breed and divide the population into age/gender/breed/use classes, and use these specific prevalence estimates to compare the populations - so you look at dogs under 6 mos old, then 1-2 yrs, etc. this is called estimating specific prevalence. The second thing you can do is to statistically transform the overall crude measures so they can apply to a hypothetical population that's standardized with respect to age or whatever. This is called the method of adjusting rates. It's worth remembering that it's important to use one of these methods before comparing prevalences across populations or drawing conclusions. INCIDENCE incidence is the instantaneous force of disease occurrence. incidence is often used interchangeably with prevalence but it isn't the same thing. the big differences are - prevalence is a proportion, incidence is a rate. prevalence can't be used to predict disease status in the future - incidence can. if you know prevalence, you only know what's happening now, but you can't tell what's going to happen next week. incidence is a very special kind of rate. it is the per capita instantaneous rate of disease occurrence. **important** now, if you haven't done calculus and stuff you won't know too much about this, but that's ok. assuming you know the instantaneous rate, know you should be able to predict how many animals will get sick in a period of time. if incidence is large, the probability of an animal in the population acquiring the disease is also large. if incidence is low, probability of acquiring disease is low. incidence is the purest measure of disease occurence we have. incidence is often not constant with infectious disease. it may be greater during cool damp weather, or whatever. it will change during an epidemic. we often assume that incidence is constant when we talk about chronic noninfectious diseases, or infectious diseases that are kind of sputtering along at an endemic level. but seasonal factors and epidemics don't involve constant incidence. measuring constant incidence (i) i = 1/average time to disease onset incidence of porcine parvovirus - average age of infection is 6 mos, so i = 1/0.5 (per year). that's 2/yr. That doesn't mean there are two cases/year! this is a theoretical concept, ok? just know if incidence is high probability of acquiring it is high and vice versa. If there is a high probability of acquiring it, average age of onset will be lower. if you get a disease by 2 yrs old, i = 1/2 per year ---break--- 2-3 So, we measure constant incidence by measuring the average time to disease onset. But, how do you do that? You can't usually do that in real life. Who knows the average time of disease onset? Not us. So you use a derivative. i = # of cases during the interval of interest/cumulative animal*months at risk what are "animal*months" ? think of "man*hours". cumulative # of animal*months is like that. you observe the population from time a to time b, and you work out the length of time each animal was free of disease, and you add it all up, and that's the denominator. say you watch 5 animals for 1 month. during that month, one animal gets sick after the first week. that animal is free of disease one week. the others are free of disease for 4 weeks. so the denominator would be 17 animal*weeks, or 4.25 animal*months. it's hard to measure the denominator so we use all kinds of approximate methods to estimate a constant incidence. to illustrate that we'll see an overhead :) incidence, again - the instantaneous per capita change in the # of animals at risk of disease. i = 1/avg time to onset i = # new cases / animal*months free of disease or more simply: i = A/Rt where A= # of new cases in interval t, and R = # at risk during t. this is good for a large population with a relatively rare disease (low incidence). This is an approximation. Incidence is the "purest" measurement of the rate of disease onset, but is really hard to estimate, because counting the number at risk is often very difficult. therefore we resort to other methods. the most commonly used measurement/approximation is called CUMULATIVE INCIDENCE. cumulative incidence is so often used instead of true incidence that elementary texts often define it as incidence without explaining it is an approximation. this is because, in practice, most people use approximations. cumulative incidence can only be used as a good approximation when the actual value of the true incidence is very small. if this is the case, then it is a very good approximate estimate. so this is just a happy numerical accident (see handout for details). the advantage of using cumulative incidence is that it is much easier to measure and has a very concrete reference. it is the proportion of animals which become diseased during some specified time. it is sometimes called "the risk of disease". month # diseased cum. incidence 1 2 0.4 2 3 0.6 3 4 0.8 4 5 1.0 watching 5 pigs over four months, after the first month 2 are sick for a c.i. of 0.4 and so forth. by the end of 4 mos, we have 5/5 sick pigs for a c.i of 1.0. Clearly c.i. increases with time, as more of the disease free cohort becomes diseased. we often use c.i. as an approximate measure of true incidence. if true incidence is constant and c.i. increases with time, how do we get a true incidence? Divide by time. c.i/time. summary of cumulative incidence -note that cumulative incidence is meaningless without a statement of the time interval over which it was measured - because it increases with time. recall it is often called the risk of disease - it is the probability of getting the dz over the interval of interest. if we start with Ro animals at risk and A of them become diseased in the interval t, then ci = A/Ro cumulative incidence relates to incidence as: ci = 1-e^-it that's 1 minus (e to the negative i*t) so, if i is very small, less than 0.01, it is almost identical to cumulative incidence. having said that, ci is a perfectly good measure of disease occurrence. it's just not the same as true incidence and you shouldn't confuse that. so those are the three measures of disease occurrence - prevalence, incidence, and cumulative incidence. how do we use these measures to look at the effect? prevalence is much less a fundamental measure of disease occurrence than is incidence, because prevalence includes the duration of dz. it can be shown that prevalence = incidence * duration of disease /(1 + incidence*duration) suppose duration D is two weeks, and incidence i is .001 1 + 2* 0.001 is very close to one. That would make prevalence very close to iD, assuming constant incidence and a steady state and a very small incidence. see p 5 of handout. why is incidence useful? it's hard to measure. why bother? it's a more fundamental measure of disease occurrence, first of all. but also, it's an instantaneous rate, and you can add instantaneous rates together or subtract them. if you want to concatenate finite rates you have to multiply. so instantaneous rates are easier to handle. skipping the part about logarithms just realize it's easier to add/subtract instantaneous rates. remember - disease is always multicausal. consider bovine parasitic gastroenteritis. PG arises when calves are infected with nematodes. some of the nematodes are more pathogenic than others. whether a calf suffers from PG depends on worm burden, worm species, nutritional status of calf, age of calf, etc. so disease is multicausal. whether it occurs or not depends on many things. immune status. genetics. yada yada. it is the job of the epidemiologist to identify all the causes of disease. he had us all sketch a graph. one group of calves is on the 'white' (top) pasture where there is only one species of nematode. the other group is on the pasture with two spp of nematodes. incidence of disease on the white pasture is 0.05 and few calves get disease. on the yellow pasture, incidence is 0.3, and by the end of the time almost all calves are diseased. now, to find the contribution of second parasite, you can subtract the incidence from the white pasture from the incidence from the yellow pasture. so we find that the second species has a large contribution to disease incidence: 0.25 - so parasite spp two is a cause of PGE Measures of Effect: in epidemiology we define effect as the difference in dz occurrence (however measured) between two groups who differ wrt some putative causal characteristic. we talk about populations being exposed to some factor, or to "the exposure" where that is doublespeak for the factor itself. example - many people with lung cancer are exposed to cigarettes - and the exposure is the cigarette. usefulness of the concept of effect is that we can ID factors that play a causal role in disease onset - and then we can get rid of them and reduce risk of dz. there are various measures of effect. simplest is the absolute effect - the incidence in the exposed group minus incidence in unexposed group, as shown in the calf graph. i(E) - i(0). if this equals zero, then the factor has no effect, plays no causal role. however, absolute effect isn't that useful, and here's why. suppose you invest $1000 and get $2000 back. absolute difference is $1000. relative increase is 100%. if you invest $10,000 and get $11000 back, it's the same absolute difference but only a 10% relative increase. it's the same thing with effect - you want relative effect. relative effect is [i(E) - i(0)] / i(0) or [i(E)/i(0)] - 1 relative effect = relative risk - 1 relative risk is some measure of incidence in exposed group divided by some measure of incidence in the unexposed group. so the incidence ratio is called relative risk. so how is this usually measured? via cumulative incidence, which is often called the risk of acquiring the disease. relative effect vs relative risk when exposure has no effect on disease occurrence, incidence in exposed group equals that in unexposed group, so relative effect is 0 and relative risk is 1 when there is no effect - relative risk = 1. remember that. when the exposure increases disease incidence, relative risk is greater than 1. what happens if relative risk is less than 1? the relative risk of dental caries in permanent teeth in american cities with highly fluoridated water is less than one. So, fluoridated water prevents disease. if you see relative risks less than one, your exposure is protecting against a disease. the relative risk among middle aged men of heart disease exposed to exercise is less than one. relative risk is one of the most common measures of effect. you rarely see absolute or relative effect used as measures of effect. ---end---