From Big Data to Chemical Information
Large and complex data sets, often referred to as “Big Data”, can be difficult to manage and analyse using conventional processes and software tools. Such data sets exist in many disciplines, and chemistry is no exception. Most chemists are familiar with the huge growth in the number of compounds registered in databases in the public domain, the majority of which have data associated with them. Many will also have struggled with increasingly large data sets generated through their work, often stored in spreadsheets with limited analytical capabilities. The CICAG, in partnership with the EPSRC’s Dial-a-Molecule Grand Challenge Network, is therefore organising a scientific meeting to explore the challenges presented by big data in chemistry.
What will be covered?
In this meeting we will explore the problems of data overload, opportunities that large data sets can present, and potential IT solutions to help chemists obtain information and knowledge from data repositories. Topics will include:
- Managing large data sets
- Differentiating between relevant and irrelevant data
- The importance of data quality and integrity
- Contextualisation and classification of data
- The role of data standards and metadata
- Data integration
Who should attend?
Anyone with an interest in the management of chemical data and information, in the efficient exchange of ideas and the way in which computers, the web and internet can most appropriately advance chemical understanding and innovation. The event will be relevant to industrial, commercial, publishing, governmental and educational organisations.
How can you contribute?
As well as the talks from expert speakers there will be plenty of opportunity for discussion and networking. A record will be made of the meeting, including the discussion, and will be made available initially to those attending the meeting.
Meeting report and presentations: please visit the RSC-CICAG homepage or download here
We thank Dr Colin Bird (University of Southampton) for kindly providing this report
Program: Download as a PDF here
10:00 | Registration and tea/coffee |
The Rise and Impact of Big Data | |
10:30 | Welcome and introduction |
Helen Cooke (CICAG Committee Chair) | |
10:40 | Big Data and the Dial-a-Molecule Grand Challenge |
Prof. Richard Whitby (University of Southampton and PI Dial-a-Molecule Grand Challenge | |
11:00 | Big, broad and blighted data |
Prof. Jeremy Frey (University of Southampton) | |
11:20 | Digital disruption in the laboratory: joined up science? |
John Trigg (Automation ad Analytical Management Group, RSC) | |
11:40 | Big data chemistry |
Prof. Jonathan Goodman (University of Cambridge) | |
12:00 | Discussion |
12:20 | Lunch |
Approaches to Managing Big Data and Maximising Opportunities | |
13:20 | Managing and searching large chemical structure data resources |
Mark J Forster (Syngenta R&D) | |
13:40 | Data-Rich Organic Chemistry |
Prof. Donna Blackmond (Scripps Research Institute) | |
14:00 | Use of data standards and metadata in information exchange |
Rachel Uphill (GSK) | |
14:20 | 100 million compounds, 100K protein structures, 2 million reactions, 4 million journal articles, 20 million patents and 15 billion substructures: is 20TB really big data? |
Noel O’Boyle (NextMove Software) | |
14:40 | Dealing with the wealth of open source data |
John Holliday (University of Sheffield) | |
15:00 | Discussion |
15:20 | Tea/coffee |
15:40 | Keynote: Activities at the Royal Society of Chemistry to gather, extract and analyse big datasets in chemistry |
Tony Williams (Royal Society of Chemistry) | |
16:20 | Meeting close |