Introduction to Unit 5: Spatial Cluster Detection

This is the fifth unit in the module ‘GIS for Analysis of Health’. This unit introduces the identification and interpretation of geographical clusters of disease. Disease clustering is particularly relevant to GIS use as it is an inherently spatial phenomenon. Cluster identification is made complex due to the need to understand whether the observed pattern is simply a reflection of the underlying population characteristics and also to recognise whether it may have been expected to occur by chance. Cluster identification is important both for causal analysis and also for directling health care interventions.

There are 7 subsections in this unit:

  1. Overview of clustering issues
  2. Area data: Probability mapping
  3. Area data: Empirical Bayes
  4. Area data: Local measures of spatial autocorrelation
  5. Point data: Tests for clustering
  6. Point data: Identifying cluster locations
  7. Assignment for Unit 5

 
This subsection sets out the structure for the remaining materials.

Next, ‘overview of clustering concepts’ explains how we might expect to find clusters in both area- and point-referenced disease data and issues which must be considered in their interpretation.

The second and third subsections are concerned with the need to assess whether observed clusters actually represent unusual occurrences with reference to denominator populations and probability concepts. ‘Probability mapping’ involves assessment of how likely an observed rate would have been to occur in a population of a given size, while ’empirical Bayes’ methods provide us with a framework for the adjustment of observed values to make them less sensitive to small number problems.

The fourth subsection, ‘local measures of spatial autocorrelation’, considers statistics for the identification of local clusters in areal data, taking the particular example of Moran’s I statistic. Consideration of the issues at a local level helps us to understand why global (whole-map) statistics may not be the most appropriate way of identifying clusters. Assessment of cluster significance also draws attention to the fact that using statistical significance testing, spatial datasets with large numbers of regions will generate surprising numbers of apparently significant values by chance. The fifth subsection is concerned with statistical tests for global (whole map) clustering, and introduces three tests from the large range of methods available. This part of the unit involves the use of a childhood leukaemia dataset to explore some of the clustering concepts that are introduced in this module. The sixth part of the unit, ‘identifying cluster locations’, considers how local clusters can be identified in point data where the locations of disease cases are available.

The final part of the unit is the assignment. The assignment is literature-based and involves a review of studies that make use of different methodologies for cluster detection in health datasets. Deadlines and further information are provided in the syllabus and your calendar.

Expect to spend about 2 weeks working through these materials. The various subsections can be carried out in any order but the Assignment question should be left as the final activity.


Activity

In preparation for studying this unit, read Chapter 5 of the course textbook Cromley and McLafferty.

Comments are closed.