Drawing parallels – the processing of data about children in education and social care

By Sarah Gorin, Ros Edwards and Val Gillies

During our research, we have been learning more about the ways that Government agencies such as health, social care and education collect, process and join up information about families. Schools, like other Government agencies collect and process an increasing volume of information about children. Data is collected for administrative purposes, such as: monitoring attendance, attainment, progress and performance; for safeguarding children; and to promote and support education and learning.

Information about children is not only captured by the school for their own and purposes determined by the Government, but also by private educational technology (EdTech) companies who gather data on children via their use of apps, that may be free to download, and recommended by teachers as promoting learning. These companies may sell on information for marketing or research purposes. Since the pandemic the use of EdTech has grown exponentially, meaning the data being gathered on children both through schools and by EdTech providers is greater still, raising the stakes in terms of the protection of childrenā€™s personal data.

A new report by The Digital Futures Commission (DFC) ā€˜Education Data Reality: The challenges for schools in managing childrenā€™s education dataā€™ examines the views of professionals who work in or with schools on the procurement of, data protection for, or uses of digital technologies in schools. The report describes the range of EdTech used in schools and the complex issues that managing it presents.

In a blog about the report, the main author Sarah Turner highlights four key issues that constrain childrenā€™s best interests:

  • The benefits of EdTech and the data processed from children in schools are currently not discernible or in childrenā€™s best interests. Nor are they proportionate to the scope, scale and sensitivity of data currently processed from children in schools.
  • Schools have limited control or oversight over data processed from children through their uses of EdTech. The power imbalance between EdTech providers and schools is structured in the terms of the use they signed up to and exacerbated by external pressure to use some EdTech services.
  • There is a distinct lack of comprehensive guidance for schools on how to manage EdTech providersā€™ data practices. Nor is there a minimum standard for acceptable features, data practices and evidence-based benefits for schools to navigate the currently fragmented EdTech market and select appropriate EdTech that offers educational benefits proportionate to the data it processes.
  • Patchy access to and security of digital devices at school and home due to cost and resource barriers means that access to digital technologies to deliver and receive education remains inequitable.

The report is focused on the processing of education data about families, however there are many interesting parallels with the findings from our project on the way data about families is collected, processed and used by local authorities:

  • Firstly, there is a lack of evidence about the benefits of the use of digital technologies in both schools and in local authorities and a lack of understanding about the risks to childrenā€™s data privacy.
  • There is a lack of government guidance for schools as there is for local authorities about the digital technologies that they employ, meaning that organisations are left individually responsible for ensuring that they are compliant with General Data Protection Regulation (GPPR).
  • Schools, like local authorities are time, resource and expertise poor. Often neither have the data protection expertise to understand and consider the risks versus the benefits of data processing for childrenā€™s best interests.
  • There is a lack of transparency in how data is collected, handled and processed by Government agencies as well as third parties who gain access to data about families, either through children using their apps for educational purposes or through local authorities employing them for the development of predictive analytics systems.
  • Public awareness and understanding about how data is collected and processed and the risks of data sharing to childrenā€™s privacy are low and are not well understood by parents and children.

We welcome this new report by the Digital Futures Commission and hope that it stimulates more discussion and awareness amongst professionals and families.

Childrenā€™s visibility, vulnerability and voice in official statistics and their use

By Sarah Gorin, Ros Edwards and Val Gillies

Throughout our project we have been looking at parental social licence for the linking together of Government data about familiesā€™ lives across areas such as health, education and social care. Whilst our research focus has been on parents, it is also important we listen to childrenā€™s views. A vast amount of data is collected about children across Government and non-Government agencies, yet it would seem children and young people are rarely asked what they consider to be acceptable uses of their personal information. It is important that children are given this opportunity, under Article 12 of the UN Convention on the Rights of the Child, that requires that childrenā€™s views should be heard and considered on all matters that affect them.

 A recent report ā€˜Visibility, Vulnerability and Voiceā€™ by The Office for Statistics Regulation (an independent body that regulates the use of official statistics) has drawn attention to the importance of including children and young people in official statistics.

The report provides a framework for considering the needs of children and young people in the development of official statistics that they have named the ā€˜3Vā€™sā€™ framework and suggests seeing statistics about children and young people with 3 lenses: that of ā€˜Visibilityā€™, making statistics on children and young people available; ā€˜Vulnerabilityā€™, ensuring collection and analysis of data about children who are vulnerable to poorer outcomes and ā€˜Voiceā€™, ensuring statistics reflect the views of children and young people and they are given a voice in how their data is used.

In considering childrenā€™s ā€˜Voiceā€™ the Office for Statistics Regulation reflect that all official statistics producers should:

  • Seek the views of children and young people themselves rather than relying on proxies from adults.
  • Consider, and respond to, the data needs of children and young people.
  • Involve children and young people in the development of statistics for and about them.
  • Ensure children and young people have a voice around how their data are used in official statistics and in research using the data underpinning them.

Whilst the report focuses on the need to involve children and young people in the development of official statistics, the same also applies more broadly to the development of policy around the use of data. A report by DigitalDefendMe,ā€˜The Words We Use in Data Policyā€™ considers the way children are framed in data policy and the lack of representation or engagement with children about their views. We welcome these reports and the focus and commitment to improving opportunities for children and young people to be involved in developments in the way their data is linked together and used.

Question marks over data analytics for family intervention

by Ros Edwards, Sarah Gorin and Val Gillies

The National Data Strategy encourages the UKā€™s central and local government to team up with the private sector to digitally share and join up records to inform and improve services. One example of this is the area of troublesome families, where itā€™s thought that the use of merged records and algorithms can help spot or pre-empt issues by intervening early. But there are questions over this approach and this is something our project has been looking into. In our first published journal article, we have been examining the rationales presented by the parties behind data analytics used in this context to see if they really do present solutions. Ā 

The application of algorithmic tools is a form of technological solution; based on indicators in the routinely collected data, in an effort to draw out profiles, patterns and predictions that enable services to target and fix troublesome families.  But local authorities often need to turn to commercial data analytic companies to build the required digital systems and algorithms.

In our paper we analysed national and local government reports and statements, and the websites of data analytic companies, addressing data linkage and analytics in the family intervention field.  We looked in particular at rationales for and against data integration and analytics.  We use a ā€˜problem-solvingā€™ analytic approach, which focuses on how issues are produced as particular sorts of problems that demand certain sorts of solutions to fix them.  This helps us to identify a double-faceted chain of problems and solutions.  

Seeking and targeting families

Families in need of intervention and costing public money are identified as a social problem and local authorities given the responsibility of fixing that problem. Local authorities need to seek out and target these families for intervention. And it is experts in data analytics that, in turn, will solve that identification problem for them.  In turn companies are reliant on citizens being turned into data (datafied) by local authorities and other public services.

We identified three main sorts of rationales in the data analytic companies promotion of their products that will solve local authoritiesā€™ problems: the power of superior knowledge, harnessing time, and economic efficiency.

Companies promote their automated data analytics products as powerful and transformational.  They hand control of superior, objective and accurate, knowledge to local authorities so that they can use profiling criteria to identify families where there are hidden risks, for intervention.  And their systems help local authority services such as social care and education collaborate with other services like health and the police, through data sharing and integration.

Data analytics is presented as harnessing time in the service of local authorities as an early warning system that enables them quickly to identify families as problems arise.  It is the provision of an holistic view based on existing past records that local authorities hold about families, and the inputting of ā€˜real timeā€™ present administrative data on families as it comes in.  In turn, this provides foresight, helping local authorities into the future ā€“ predicting which families are likely to become risks in advance and acting to pre-empt this, planning ahead using accurate information.  

Another key selling point for data analytics companies is that their products allow economic efficiency.  Local authorities will know how much families cost them, and can make assured decisions about where to put or withdraw resources of finances and staffing.  Data analytic products produce data trails that cater for local authorities to prepare Government returns and respond to future central Government payment-by-results initiatives, maximising the income that can be secured for their constrained budgets.

Questions to be asked

But there are questions to be asked about whether or not data linkage and analytics does provide powerful and efficient solutions, which we consider in our article.  Concerns have been raised about the errors and bias in administrative records, resulting in unfair targeting of certain families. 

Particular groups of parents and families are disproportionately represented in social security, social care and criminal justice systems, leading to existing social divisions of class, race and gender built into the data sets.  For example, there is evidence that racial and gender profiling discriminations are built into the data, such as the inclusion of young Black men who have never been in trouble in the Metropolitan Police Gangs Matrix.  And automated modelling equates socio-economic disadvantage with risk of child maltreatment, meaning that families are more likely to be identified for early intervention just because they are poor.  On top of that, studies drawing on longitudinal data are showing that the success rates of predictive systems are worryingly low. 

All of which raise a more fundamental question of whether or not algorithms should be built and implemented for services that intervene in familiesā€™ lives.  In the next stage of our research, we will be asking parents about their views on this and on the way that information about families is collected and used by policy-makers and service providers.  

A murky picture ā€“ who uses data linkage and predictive analytics to intervene in familiesā€™ lives?

In the first of a series of blogs discussing key issues and challenges that arise from our project, Dr Sarah Gorin discusses the problems encountered by our team as we try to find out which local authorities in the UK are using data linkage and predictive analytics to help them make decisions about whether to intervene in the lives of families.

As background context to our project, it seemed important to establish how many local authorities are using data linkage and predictive analytics with personal data about families and in what ways. To us this seemed a straightforward question, and yet it has been surprisingly hard to gain an accurate picture. Six months into our project and we are still struggling to find out.

In order to get some answers, we have been reaching out to other interested parties and have had numerous people get in touch with us too: from academic research centres, local authorities, independent foundation research bodies, to government-initiated research and evaluation centres.  Even government linked initiatives are finding this difficult, not just us academic outsiders!

So what are the issues that have been making this so difficult for us and others?

No centralised system of recording

One of the biggest problems is finding information. There is currently no centralised way that local authorities routinely record their use of personal data about families for data linkage or predictive analytics. In 2018, the Guardian highlighted the development of the use of predictive analytics in child safeguarding and the associated concerns about ethics and data privacy. They wrote:

ā€œThere is no national oversight of predictive analytics systems by central government, resulting in vastly different approaches to transparency by different authorities.ā€

This means that it is very difficult for anyone to find out relevant information about what is being done in their own or other local authorities. Not only does this have ethical implications in terms of the transparency, openness and accountability of local authorities but also more importantly, means that families who experience interventions by services are unlikely to know how their data has been handled and what criteria has been used to identify them.

In several European cities they are trialling the use of a public register for mandatory reporting of the use of algorithmic decision-making systems. The best way to take this forward is being discussed here and in other countries.

Pace of change

Another issue is the pace of change. Searching the internet for information about which local authorities are linking familiesā€™ personal data and using it for predictive analytics is complicated by the lack of one common language to describe the issues. A myriad of terms are being used and they change over timeā€¦ā€˜data linkageā€™; ā€˜data warehousingā€™; ā€˜risk or predictive analyticsā€™; ā€˜artificial intelligenceā€™ (AI); ā€˜machine learningā€™; ā€˜predictive algorithmsā€™; ā€˜algorithmic or automated decision-makingā€™ to name but a few.

The speed of change also means that whilst some local authorities who were developing systems several years ago, may have cancelled or paused their use of predictive analytics, others may have started to develop it.

The Cardiff University Data Justice Lab in partnership with the Carnegie UK Trust are undertaking a project to map where and why government departments and agencies in Europe, Australia, Canada, New Zealand and the United States have decided to pause or cancel their use of algorithmic and automated decision support systems.

General Data Protection Regulation (GDPR)

GDPR and the variation in the way in which it is being interpreted may be another significant problem that is preventing us getting to grips with what is going on. Under GDPR, individuals have the right to be informed about:

  • the collection and use of their personal data
  • information including the purposes for processing personal data
  • retention periods for data held
  • and with whom personal data will be shared

As part of their responsibilities under GDPR, local authorities should publish a privacy notice which includes the lawful basis for processing data as well as the purposes of the processing. However, the way that local authorities interpret this seems to vary, as does the quality, amount of detail given and level of transparency of information on privacy notices. Local authorities may only provide general statements about the deployment of predictive analytics and can lack transparency about exactly what data is being used and for what purpose.

Lack of transparency

This lack of transparency has been identified in a Review by the Committee on Standards in Public Life who published a report in February 2020 on Artificial Intelligence and Public Standards. In this report it highlighted that Government and public sector organisations are failing to be sufficiently open. It stated:

ā€œEvidence submitted to this review suggests that at present the government and public bodies are not sufficiently transparent about their use of AI. Many contributors, including a number of academics, civil society groups and public officials said that it was too difficult to find out where the government is currently using AI. Even those working closely with the UK government on the development of AI policy, including staff at the Alan Turing Institute and the Centre for Data Ethics and Innovation, expressed frustration at their inability to find out which government departments were using these systems and how.ā€ (p.18)

Whilst some local authorities seem less than forthcoming in divulging information, this is not the case for all. For example, in Essex, a Centre for Data Analytics has been formed as a partnership between Essex County Council, Essex Police and the University of Essex. They have developed a website and associated media that provides information about the predictive analytics projects they are undertaking using familiesā€™ data from a range of partners including the police and health services.

So what are we doing?

As part of our project on parental social licence for data linkage and analytics, our team are undertaking a process of gathering information through internet searching and snowballing to put together as much information as we can find and will continue to do so throughout the course of the project. So far, the most useful sources of information have included:

  • the Cardiff University Data Justice Lab report that examines the uses of data analytics in public services in the UK, through both Freedom of Information requests to all local authorities and interviews/workshops with stakeholders
  • the WhatDoTheyKnow website which allows you to search previous FOI requests
  • internet searches for relevant local authority documents, such as commissioning plans, community safety strategies and Local Government Association Digital Transformation Strategy reports
  • media reports
  • individual local authority and project websites

It would seem we have some way to go yet, but it is a work in progress!

If you are interested in this area weā€™d be pleased to know of othersā€™ experiences or if youā€™d like to contribute a blog on this or a related topic, do get in touch via our email datalinkingproject@gmail.com