Data Driven Synthesis

T1:      Data Driven Synthesis.


Aim: To change synthesis into a data lead discipline embracing new technologies to capture and use better quality data for effective synthetic route and process design.


We have so far identified several sub-themes:


1.1       Facilitate the adoption of Electronic Laboratory Notebooks (ELNs) in academia and promote the development of efficient ways to capture process and analytical information into them.

1.2       Support the development of standards for reporting / exchange of reaction data (particularly from ELNs and automated systems) amenable to an open access database of reactions.

1.3       Promote the use of statistical methods (DoE, PCA, MVA etc) in chemistry.

1.4       Promote the use of physical organic and chemical engineering concepts in synthesis to better understand and develop reactions.

1.5       Develop a funded cross disciplinary community directed at the goal of predicting the outcome of unknown reactions, including application to planning of synthetic routes.

1.6       Identify current constraints (methodology / instrumentation) in reaction engineering and kinetic analysis and encourage research proposals to close the gaps, including the use of data to fast-track discovery to manufacture.

We need to bring together chemists with chemical engineers, mathematicians/statisticians, computer- and data-scientists to enable the use of reaction data, ab initio modeling, and mechanistic and process understanding to be applied effectively in synthetic route design and application. A key enabling step is to make detailed reaction data (on all reactions, not just the most successful) available. We will continue to promote the uptake of ELNs by coordinating trials, and writing-up the results to provide guidance on selection. We will work with willing ELN providers to help to develop the products to better suit academic use, including pricing structures and training materials. Finally we will work to identify suitable ‘free’ ELN solutions and facilitate evaluation, adoption, and identification of sustainability mechanisms.

There is the opportunity to engage with experts in data capture (e.g. vision systems) and analysis to reduce the burden and increase the quality, of data capture in the laboratory and will run an open meeting to promoting interest in the area.


We will continue to work with our partners towards a set of data standards to allow effective exchange and automated mining of ELN data. We will also work with our partners to develop a mechanism to make reaction data openly available.


Statistical methods allow maximum value to be obtained from data generated and provide effective experimental designs that deliver higher quality solutions at lower cost. Whilst the methodologies and analytical tools exist, the skills to apply the appropriate techniques are largely lacking in synthesis. An excellent start has been made to promote the use of statistical methods in synthesis during phase II and we will sustain that support. We hope to promote the establishment of a small number of centres with deep expertise in the application of statistical methods to chemistry research and to embed statistical methods into all mainstream academic chemistry courses. We will leverage the case studies and educational modules funded in D-a-M phase II by running ‘roadshows’ to promote the applicability of the technique, and workshops to disseminate practical knowledge. We will continue to develop a support community where skills are recognized and shared

Although at a much earlier stage of development than 1.3 we recognize that a similar approach is needed for 1.4. A first step will be to run the same type of open meeting to identify needs and mechanisms, which was very successful for 1.3.

A follow-up to the Sept 2014 Leeds meeting in area 1.5 and an open meeting covering 1.6 will be an early events of D-a-M phase III, and are expected to lead directly to other focused activities. For 1.5 we will make the case to EPSRC and industry for a specific call in this critical, and commercially valuable, area in order to attract leading figures from outside chemistry.