March 29, 2019 |

Accelerating innovation in materials science with machine learning

Machine learning (ML) is already being used in ways that affect our everyday lives, often in ways we may not even realize. Amazon, for example, pioneered the adoption of ML to drive its product recommendations. The company has organized around its AI and ML efforts to great success.

Given the wide range of applications and benefits offered by ML, there is a current push to implement it in the materials science sector, with many R&D-based organizations investing heavily in the development of digital strategies. However, one challenge these teams face is that scientific data is often complex and disconnected. This is a problem because ML systems rely on well-organized, high-quality data. So, how can you effectively apply ML to accelerate innovation and growth in your materials science company?

Here we explore the opportunities ML could bring to an organization and look at three of the best strategies to overcome common implementation challenges.

Improve materials science research and development with machine learning

In the materials science sector, implementation of ML is in its early stages and we have yet to see the technology achieve its full potential. However, it won't be long before ML can be expected to help scientists produce new materials that meet specific properties faster and more efficiently than traditional tools (e.g., prediction modeling) currently allow.

Consider a scientist who needs to develop a material that maintains its elasticity at extreme temperatures. In the near future, ML will be able to predict which chemical reactions and experimental conditions will be most successful. Not only will this save time by cutting the amount of papers and data that must be searched to find the best starting point and the number of experiments required to optimize the material, but it will also reduce costs.

Such benefits will be particularly useful to those working in the polymers sector where accurate predictions are particularly difficult to make, requiring extensive training and the expertise of an experienced chemist. What’s more, even the most seasoned chemists can only make predictions based on their own experiences and the data they have available. By using predictive tools that leverage ML algorithms and big data, scientists can narrow down specific chemicals and conditions far faster and more accurately than manual efforts.

In this way, ML tools will ultimately help to accelerate innovation in the materials science industry while improving efficiencies and reducing costs. In the plastics manufacturing industry, we've already seen companies that actively use big data and ML grow 50% faster than non-users. Be sure you're your company is one of the early adopters, so you gain the competitive edge and don't get left behind.

How did a leading materials organization learn to create a robust data framework to support applications, such as machine learning and artificial intelligence (AI)? Read our Case Study.

How to successfully implement machine learning

ML-based tools can clearly help drive growth in organizations, so why aren't they already widely used in materials science? Simply put, establishing a successful ML algorithm that delivers the results you need is not easy. There are three main areas to carefully consider when embarking on the ML journey: approach to the project, your data foundation and how to handle multi-dimensional data.

1. Monitor the machine learning project as a whole

ML offers an opportunity to enhance a company in many ways. When starting out, it's important to have a clear outcome in mind and commitment to the on-going investment that will be required. For ML projects to succeed, the company's decision makers must understand the importance of managing expectations within the organization and be willing to change processes as required to ensure company-wide alignment at all levels. If you were introducing a prediction modeling tool, for example, it would be important for the scientists to be on board with its introduction. Otherwise, scientists would be more likely to continue making manual predictions, leaving the company without the benefits of having ML in place.

Once you've established your ML program's objectives, it's important to look at the project as a whole. Getting caught up in the technology or comparing usage by other organizations will be ancillary noise in the ML journey. This is your project and how the tech is implemented will vary depending on each organization's unique requirements.

Likewise, the extensive data is your friend. It's more beneficial to focus on the bigger picture and keep the project on track instead of getting stuck in the details of the data. Be sure to have access to strong expertise in data curation and modeling so that the focus is on the bigger picture and the project is on the best path forward.

2. Take the time to build a strong data foundation

Like all data systems, with ML, what you get out is dependent on what you put in, so it is best to establish ML algorithms on solid, high-quality data to get reliable results and predictions. Materials science data is often very complex, so creating a high-quality database is not a simple undertaking. If time is spent building this all-important foundation for the project, chances of success will significantly increase when implementing ML.

Consider the amount of data and if the data is complete. Many companies fall at this hurdle, particularly because the way in which scientific data is documented is not always consistent and can be prone to gaps, meaning it can't be used for ML training processes. If lack of relevant data is a problem, it may be possible to acquire, license or borrow additional data sets from public repositories, government sources and commercial partners to fill the gaps. Accessing previously curated data in this way will greatly expedite the collection process, potentially saving the company millions of dollars and months of effort.

If using company-owned data as the basis for ML training, it's vital that it is high-quality and normalized. The reporting of scientific data can vary and the information itself may be captured in multiple formats, such as text, chemical structures, graphs and charts. Therefore, it is essential that human curation is part of your data collection and governance process. Materials scientists and technicians are able to review and interpret information elements that ML cannot. This type of intellectual indexing requires greater investment, but it also results in far more valuable and useful data for years to come. If you don't have the resources to undertake manual curation, partnering with an organization, such as CAS, that can provide human expertise and specialized technologies for indexing and curating scientific data, can often allow this process to be completed more quickly and cost-effectively.

Finally, when the data is assuredly high-quality, develop a simple data structure to support your database. Well thought-out data structures combined with high-quality datasets are the best tools for effective ML training. They help the technology identify and analyze patterns, trends and relationships, which leads to more accurate predictions.

3. Succeed at machine learning with multi-dimensional data

Materials science data is unavoidably multi-dimensional, with numerous inputs and outputs. This poses a challenge to the implementation of ML in the industry. Even with a well-structured, high-quality database, the predicative ability and effectiveness of an ML algorithm decreases as the dimensionality of the data increases.

However, not all is lost. Currently, investments are being made in algorithms that can process increasingly complex and dimensional data. These algorithms fall into one of two categories: supervised learning and unsupervised learning. Briefly, supervised learning is when outcomes are already known, so the system maps an input to an output based on example input-output pairs. Unsupervised learning is when the outcomes are not known, and the system discovers the answer within the data.

Principal component analysis (PCA) is one example of an unsupervised learning method within ML that simplifies the complexity of multi-dimensional data while retaining trends and patterns. By transforming the data into fewer dimensions, the algorithm can more easily find patterns without reference to prior knowledge. From this, it is easy to see how PCA could potentially be used to develop a prediction modeling tool because it would be able to simplify and analyze complex scientific data. It then suggests which chemicals and conditions are needed to produce a material with a specific characteristic.

The approach taken to overcome challenges around multi-dimensional data will ultimately be dependent on the goal of the company's ML program. It may be beneficial to seek advice whether an existing algorithm would be appropriate for the company's ML program, or if a new one should be developed.

Turn to CAS to start your machine learning journey

ML has the potential to accelerate innovation and growth while improving efficiency and reducing costs. As such, you should aim to fully leverage the power of the technology to gain and maintain a competitive edge. Implementation of ML is a complex investment; one that is well worth it.

At CAS, we are already using ML systems to interpret our expertly curated data. Contact us today to discover how you can use our databases as your data foundation and talk to a member of our team to find out how they could help you on your journey to successful ML implementation. With more than 100 years of experience, no one knows more about managing sci-tech information than CAS.

Subscribe to CAS Insights