From Big Data to Chemical Information

Large and complex data sets, often referred to as “Big Data”, can be difficult to manage and analyse using conventional processes and software tools. Such data sets exist in many disciplines, and chemistry is no exception. Most chemists are familiar with the huge growth in the number of compounds registered in databases in the public domain, the majority of which have data associated with them. Many will also have struggled with increasingly large data sets generated through their work, often stored in spreadsheets with limited analytical capabilities. The CICAG, in partnership with the EPSRC’s Dial-a-Molecule Grand Challenge Network, is therefore organising a scientific meeting to explore the challenges presented by big data in chemistry.

What will be covered?

In this meeting we will explore the problems of data overload, opportunities that large data sets can present, and potential IT solutions to help chemists obtain information and knowledge from data repositories. Topics will include:

  • Managing large data sets
  • Differentiating between relevant and irrelevant data
  • The importance of data quality and integrity
  • Contextualisation and classification of data
  • The role of data standards and metadata
  • Data integration

Who should attend?
Anyone with an interest in the management of chemical data and information, in the efficient exchange of ideas and the way in which computers, the web and internet can most appropriately advance chemical understanding and innovation. The event will be relevant to industrial, commercial, publishing, governmental and educational organisations.

How can you contribute?
As well as the talks from expert speakers there will be plenty of opportunity for discussion and networking. A record will be made of the meeting, including the discussion, and will be made available initially to those attending the meeting.

Meeting report and presentations: please visit the RSC-CICAG homepage or download here 
We thank Dr Colin Bird (University of Southampton) for kindly providing this report

Program:  Download as a PDF here

10:00 Registration and tea/coffee
The Rise and Impact of Big Data
10:30 Welcome and introduction
Helen Cooke (CICAG Committee Chair)
10:40 Big Data and the Dial-a-Molecule Grand Challenge
Prof. Richard Whitby (University of Southampton and PI Dial-a-Molecule Grand Challenge
11:00 Big, broad and blighted data
Prof. Jeremy Frey (University of Southampton)
11:20 Digital disruption in the laboratory: joined up science?
John Trigg (Automation ad Analytical Management Group, RSC)
11:40 Big data chemistry
Prof. Jonathan Goodman (University of Cambridge)
12:00 Discussion
12:20 Lunch
Approaches to Managing Big Data and Maximising Opportunities
13:20 Managing and searching large chemical structure data resources
Mark J Forster (Syngenta R&D)
13:40 Data-Rich Organic Chemistry
Prof. Donna Blackmond (Scripps Research Institute)
14:00 Use of data standards and metadata in information exchange
Rachel Uphill (GSK)
14:20 100 million compounds, 100K protein structures, 2 million reactions, 4 million journal articles, 20 million patents and 15 billion substructures: is 20TB really big data?
Noel O’Boyle (NextMove Software)
14:40 Dealing with the wealth of open source data
John Holliday (University of Sheffield)
15:00 Discussion
15:20 Tea/coffee
15:40 Keynote: Activities at the Royal Society of Chemistry to gather, extract and analyse big datasets in chemistry
Tony Williams (Royal Society of Chemistry)
16:20 Meeting close

