4.3 Small number problems and smoothing

One problem in identifying areas of high morbidity or mortality occurs in areas where the number of people diagnosed with a disease is low. In such areas, the estimated disease rate is more uncertain because of the small size of the population. In a district with a large population, an incidence rate of 5% might mean that there are 100 actual cases of disease among a population of 2,000. In a small district, the incidence rate of 5% might be derived from 3 cases of the disease among a population of 60. Clearly, the relative risk in the small district is much more uncertain because the population is low. These ‘small number’ problems are more pronounced for diseases that are rare.

In mapping diseases, a trade-off exists between using geographic units that are large enough to give stable morbidity and mortality rates, but small enough to enable the reader to discern geographical patterns. Sometimes, disease data are available for geographic units with small populations such as unit postcodes in the UK . In such circumstances, instead of mapping the raw disease rates directly, ‘smoothing’ may be used to reduce small number problems. Two types of ‘smoothing’ are often used. Global smoothing techniques (such as empirical Bayes smoothing) modify sub-district disease rates so that they are closer to an overall mean rate for the whole study area. In Local smoothing, a local average disease rate is calculated, based on the rate for a given sub-district and its neighbours. This object considers the latter technique – local smoothing.

In local smoothing, a ‘window’ of constant area or population size is moved across the map and the number of disease cases lying within the window is calculated for successive locations. The number of healthy individuals lying within the window is also identified and then a local disease rate is calculated. This process is also known as spatial filtering. Examples of this process may be seen in Ali et al (2002) and Rushton (2003).

How large (in terms of distance or population) does a ‘window’ need to be to prevent small numbers problems? There is no straightforward answer to this problem – if the ‘window’ is too small, small numbers problems will persist but if the ‘window’ is too large, real variation in disease rates may be obscured. However, Talbot et al (2000) provide some guidance on using local smoothing and found that a ‘window’ with a population of 5,000 smoothed out real variation in birthweight data. A ‘window’ with a population in the range 250 – 2,500 maintained real variation in diseases but helped reduce small numbers problems.

Smoothing can also be employed on point data sets of cases and controls as well as areal data. An alternative to displaying disease cases as a dot map is to produce a map of disease intensity in different areas or grid squares. This can prevent disclosure of confidential material about individuals and aid visual interpretation of very large point data sets.


Activity

Download the zip file set of morbidity, mortality and population data for UK districts. Produce a graph of the rate of consultations in 1998 for cancer among males under 16 compared to the total number of health consultations in this age group. Note that Feature attributes in a Shape file are stored in dBase (.dbf) format. It is possible to open such files in Excel by first converting them to Excel format from within Arcmap. To do this, head for the ArcToolBox, then conversion tools / Excel / Table To Excel, where you should be able to select the attribute table (stored in dBase format) and convert it to Excel format.

What do you notice about the shape of this graph? Why do you think the graph takes this particular shape?

Answer 1

The graph should form a ‘funnel’ shape, with the wide part of the funnel occurring in areas with low numbers of overall consultations and the narrow part occurring in areas with high numbers of overall consultations. When the number of overall consultations is low, the estimated rate of cancer consultations is much more uncertain, so there is a wider range of rates on the graph. This uncertainty accounts for the wide end of the funnel. When the number of overall consultations is high, the estimated rate of cancer consultations is much less uncertain, giving a narrower range of rates and the ‘thin end’ of the funnel. Such a graph shows the minimum population necessary to achieve a stable disease rate estimate. This minimum population size can then be used to design an appropriate spatial filter.

Hide


One problem with using ‘smoothed’ disease rates to produce maps is that map users may be suspicious of the way that the raw data have been manipulated to produce the map. Imagine that you are a GIS analyst working for a primary healthcare authority (with responsibility for doctors’ surgeries in a given region). You produce a series of maps showing disease rates for the various doctors’ surgeries within your authority using local smoothing. Can you think of a short caption of a paragraph for these maps that briefly explains what local smoothing entails, why it has been used, and how the figures on the maps can be interpreted?

Answer 2

A sample caption for the map might be: For statistical reasons, figures on disease rates will not be reliable where the population in a surgery catchment is small (this is particularly true of disease rates for population sub-groups, such as pre-school children). Consequently, disease rates shown on these maps have been calculated for neighbourhoods of surgeries using a ‘smoothing’ technique. The figures on the map therefore show disease rates for neighbourhoods, not individual surgeries.

Hide


References (Essential reading for this learning object indicated by *)

* The early part of this article also describes some examples of ‘smoothing’ in disease mapping:

Ruston, G. (2003) Public Health, GIS and spatial analytic tools. Annual Review of Public Health 24, 43-56 http://www.ncbi.nlm.nih.gov/pubmed/12471269

An example of ‘smoothing’ being used to map cholera is available in:

Ali, M., Emch, M,. Donna, J. P., Yunus, M., and Sack, R. B. (2002) Identifying environmental risk factors for endemic cholera: a raster GIS approach. Health and Place 8, 201-210.

More detailed guidance on ‘smoothing’ data for health maps using spatial filters is available in this article. The article makes recommendations about minimum population sizes and appropriate radii for filters:

Talbot, T. O., Kulldorff, M., Forand, S. P., and Haley, V. B. (2000) Evaluation of spatial filters to create smoothed maps of health data. Statistics in Medicine 19 (17-18), 2399-2408

Comments are closed.