5.2 Area data: Probability mapping
One problem in identifying areas of high morbidity or mortality occurs in areas with very small disease counts. In these areas, estimates of the relative risk of morbidity or mortality are uncertain, because the numbers are so low. In a district with a large population, a relative risk of 3 might mean that there are 1,500 actual cases of disease and an expected number of 500. In a small district, there might be 3 actual cases of the disease and only 1 expected case. Clearly, the relative risk in the small district is much more uncertain because the numbers are so low.
It is important to understand whether very high or low relative risks are occurring in such areas by chance, or whether any observed geographical patterns have some other cause. For a given district, it is possible to test whether the observed number of cases is significantly greater than would be expected by chance. The probability of this happening is often calculated using the Poisson distribution and a map of these probabilities is referred to as a ‘probability map’. The probability is calculated from the number of observed cases in a region, the population at risk, and the overall disease rate for the whole study area. The Poisson distribution is commonly used to model the number of events occuring in a given time interval, such as disease cases diagnosed in a given year. In fact, one of the first studies to use the Poisson distribution in 1898 examined the annual number of Prussian army corps troops accidentally kicked to death by horses over a 20 year period.
Probability maps have advantages over maps of disease rates in that they are not affected by ‘small numbers’ problems in regions with low populations. For relatively small populations and numbers of disease cases, Poisson probabilities can be calculated quite easily using software such as MS-Excel’s Poisson function. However, there are some weaknesses with calculating probabilities for the different regions on a map using the Poisson distribution. For example, the technique assumes that the numbers of disease cases in a given region is independent of the number of cases experienced in neighbouring regions. For many infectious diseases, this is unlikely to be true. Another difficulty is that in producing a probability map, Poisson probabilities are calculated separately for many different regions. In a map with 500 regions, on average 5 regions will have significantly high or low disease rates at the 1% significance level purely by chance. This difficulty of undertaking many statistical tests over different parts of a map is known as a multiple testing problem and affects many analyses in spatial epidemiology.
In this object, you will explore how probability maps can be created for health data and calculate probabilities from U.S. cancer mortality data.
Activity
Download the attached Excel file. The first worksheet contains a set of questions and instructions. You should work through these, using the data provided in the other sheets.
References (Essential reading for this learning object indicated by *)
An example of this technique being used to assess patterns of Hantavirus Pulmonary Syndrome, a viral disease spread by rodents, may be found here:
Bush, M., Cavia, R., Carbajo, A. E., Bellomo, C., Gonzalez, C. S., and Padula, P. (2004) Spatial and temporal analysis of the distribution of hantavirus pulmonary syndrome in Buenos Aires Province, and its relation to rodent distribution, agricultural and demographic variables. Tropical Medicine and International Health 9, 508-519. http://www.ncbi.nlm.nih.gov/pubmed/15078270
Note that you do not need to read this reference in order to complete the activity for this object.