Written By Download Whitepaper

Predicting New Chemistry: Impact of High-Quality Training Data on Prediction of Reaction Outcomes

Machine learning models supporting synthesis planning applications are largely limited to the chemistry seen in training, and the accuracy and diversity of their predictions are often diminished in sparsely populated chemical subspaces. By measuring how different datasets affect the performance of trained models, we can make stronger assertions regarding the expected coverage and novelty of synthesis planning solutions, and design datasets that will open up previously difficult areas of science. 

In this study, scientists at Bayer demonstrate the significant impact that scientist-curated reactions from the CAS Content Collection have on the predictive power of a synthesis planning model. Accuracy in prediction of outcomes in rare reaction classes increased significantly – a boost of 32 percentage points – expanding understanding into new, useful chemistry.

Predicting New Chemistry white paper cover

Request the white paper or contact our Custom Services Team to design a dataset to open up challenging areas of science.

This white paper is published in collaboration with scientists from Bayer.


  • Miriam Wollenhaupt, Ph.D., Computational Chemist, Bayer AG
  • Martín Villalba, Ph.D., Expert Applied Mathematics, Bayer AG
  • Orr Ravitz, Ph.D., Synthesis Planning Solutions, CAS

Request the Whitepaper

Your privacy is important to CAS. More detail about how we use your information is in our privacy policy.