2.3 Aggregated health data and the modifiable areal unit problem

A key feature of many spatial analyses involving health data is that data originally relating to individual patients – or even individual health events – that have been aggregated over geographical areas (for example the Province of Gauteng) or population sub-groups (for example counts of deaths among males aged 5-14).

The modifiable areal unit problem is a term for the effect on analysis of aggregating data on individuals into a set of areas (e.g. districts, counties, census tracts, or regions, say).

This type of data aggregation may be necessary for various reasons. Firstly, it may be that confidentiality constraints prevent the publication of individual-level information by the agency collecting the data. This is particularly likely to be the case with sensitive health-related events. There is generally a trade-off between the amount of attribute detail that can be provided and the size of the aggregation group – which is often a geographical area. The problems associated with the specific size and definition of geographical areas of this type is known as the modifiable areal unit problem. The modifiable areal unit problem comprises two related aspects, known as (a) scale and (b) aggregation effects: the scale effect is that pattern and relationships in our aggregated data vary according to the scale over which they have been aggregated. The patterns we see in small units may well be different from those we see in larger units. The ecological fallacy relates to the fact that relationships between variables that we observe at one level of aggregation may not necessarily apply if the data were observed at the individual level – or indeed at any other level of aggregation. The Aggregation effect is that, at any given scale, they vary simply according to where we place the boundaries between the zones. In a classic study of the Modifiable Areal Unit Problem (or MAUP), Openshaw (1983) showed how the aggregation effect could produce a shift in the correlation coefficient from -0.99 to +0.99 in an analysis of voting data for the state of Iowa. By reviewing evidence on the scale effect from many studies, Openshaw (1983) also showed how a relationship between two variables often appears stronger when it is investigated using larger areal units as opposed to smaller units.

Even when we have data on the location and health of individuals, it is frequently the case that we wish to relate health event data to environmental, behavioural and predisposing factors which are only available to us in aggregate form. This occurs even if event data are referenced as individual points but are assigned to polygons representing census zones or environmental values. Even more complex is the situation in which event data aggregated to one set of areal units are overlaid on a data layer for a different set of areal units and it is necessary to attempt some form of areal interpolation. In each of these cases, it is particularly important to understand the characteristics and limitations of areally aggregated data.

More recently, techniques have begun to emerge that tackle some of the problems that the Modifiable Areal Unit Problem raises. One such technique is multi-level modelling; another related technique is called geographically weighted regression (GWR). GWR in particular is a technique that is now available in ArcGIS in the spatial statistics part of the ArcToolBox

print friendly pdf


Activity

A number of censuses around the world collect a measure of self-reported health. In England and Wales, for example, the 2011 census asked about limiting long-term illness (any health condition that limits someone’s daily activities or ability to work). Census data are normally aggregated to preserve confidentiality into areas of different sizes (e.g. census blocks, census block groups, and census tracts in the USA). If you were looking at a census-derived measure of ill health in relation to a deprivation measure (e.g. a composite measure of unemployment, home ownership, car ownership, and overcrowding of accommodation), what – if anything – could you do that would limit the impact of the Modifiable Areal Unit Problem? Post your thoughts to the course discussion board.


References (Essential reading for this learning object indicated by *)

This paper demonstrates some of the impacts of the geographical referencing of individual point events (births and deaths) on the quality of aggregated data for areas.

McVey, E. and Baker, A (2002) Improving ONS spatial referencing – the impact on 2000 births and deaths data Population Trends 107, 14-22. http://www.ons.gov.uk/ons/rel/population-trends-rd/population-trends/no–107–spring-2002/improving-ons-spatial-referencing.pdf

The second paper provides a very relevant reflection on the role of MAUP effects, particularly spatial scale, on a range of population health indices.

Schuurman, N., Bell, N., Dunn, J. R. and Oliver, L. (2007) Deprivation Indices, Population Health and Geography: An Evaluation of the Spatial Effectiveness of Indices at Multiple Scales Journal of Urban Health 84, 591-603 Available online at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2219571/

There are several blogs providing further background on the Modifiable AReal Unit Problem, including this one: https://www.gislounge.com/modifiable-areal-unit-problem-gis/

Comments are closed.