3.1 Individual vs aggregate data

This learning object considers the critical differences between health data at the individual and aggregate levels. A full understanding of this distinction and its implications is necessary in order to make appropriate choices about the GIS concepts and techniques that can be used with health data.

Fundamentally, health and disease are experienced at the individual level and much health-related information is also captured at the individual level as people use the health care system or respond to censuses and surveys. However, there are at least two very important reasons why much analysis of health is undertaken at the aggregate level. Firstly, it may be necessary to aggregate in order to perform the desired analysis: for example consideration of disease prevalence rates in a population or identification of possible spatial clustering requires aggregation of individuals over geographical space. Secondly, we may often be in the position that the information of interest has already been aggregated at source and is not available to researchers or health care planners in disaggregated form. In these situations care is required to ensure that appropriate analysis methods are employed that are appropriate to the aggregate nature of the data. Aggregation may take place across social groups or geographical space. Sadler et al (2003) illustrate some of the difficulties that are faced when aggregating across social groups that actually embody important differences in health behaviour. Geographical referencing may itself involve aggregation when we are unable to assign unique spatial coordinates to each individual health record. Demissie et al. (2000) use a Canadian dataset to demonstrate the divergence between socioeconomic status when measured at the small census area and individual levels. In these situations it is essential to understand whether the process of aggregation may mask significant within-class or within-area variations which will have an important impact on the analysis techniques used.


The statistical method of multilevel modelling (see references and animation below) was developed in response to the challenges of disentangling data which incorporate both individual and aggregated components, and has been widely used in the health context. Multilevel regression allows investigation of effects at different levels in the aggregation hierarchy, that cannot be undertaken with conventional regression models. For example, we may suspect that health outcomes are influenced both by individual characteristics and by features of the health care system which operate at the district level. Gleave et al. (1998) describe the application of multilevel modelling to data from UK censuses of population using a longitudinal dataset known as the ONS Longitudinal Study. This contains records for individuals that have been linked between successive censuses, vital events data and cancer registrations, allowing the relationship between individual and aggregated data to be explored in ways that are not usually possible. Their study suggests that men’s individual experience of unemployment, low social class and other disadvantage only partially explain the wide variations in health observed: geographical differences are not entirely explained by the distribution of individual characteristics. This type of analysis could not be undertaken using either individual or aggregated datasets alone.

Geographical differences are not entirely explained by the distribution of individual characteristics

(Gleave et al., 1998)


Activity

Review the references by Demissie et al (2000), Fone et al. (2007) and Sadler et al. (2003) and answer the following questions based on these studies:

In the datasets considered here, which types of information would you consider to be most sensitive to aggregation effects?

Are you able to identify different potential impacts on analysis that would result from aggregation over geographical space and aggregation over attribute characteristics?

Post your thoughts on these two issues to the discussion forum on the course web site.


References (Essential reading for this learning object indicated by *)

* Demissie, K., Hanley, J. A., Menzies, D., Joseph, L. and Ernst, P. (2000) Agreement in measuring socio-economic status: area-based versus individual measures Chronic Diseases in Canada 21, 1 http://www.medicine.mcgill.ca/epidemiology/joseph/publications/Methodological/ses_agreement.pdf

* Fone D, Dunstane F, Williams G, Lloyd K, and Palmer S (2007) Places, people and mental health: a multi-level analysis of economic inactivity. Social Science and Medicine 64 (3): 633-645. https://www.sciencedirect.com/science/article/abs/pii/S0277953606004722

The MLWin software home page provides a useful summary of what multi-level modeling is: http://www.cmm.bristol.ac.uk/

* Sadler, G. R., Ryujn, L., Nguyen, T., Oh, G., Paik, G. and Kustin, B. (2003) Heterogeneity within the Asian American Community International Journal for Equity in Health 2, 12 http://www.equityhealthj.com/content/2/1/12

Comments are closed.