CASE STUDY

Curated training sets for innovative AI predictions

Industry

Diversified chemicals & electronics

Solution

CAS Custom Services℠

KEY RESULT

2x

faster than comparable projects

KEY RESULT

1st

novel device track to be 1st to market

KEY RESULT

improved prediction accuracy at each validation stage

The Challenge

Implementing a machine learning approach to innovation

A large diversified chemical company sought a first-mover advantage by developing new electronic devices with specific, novel properties. Historically, new materials and devices were developed by a small group of senior scientists through trial and error, relying on their accumulated expertise and intuition. This approach created a significant institutional knowledge risk: when key individuals left the organization, momentum on critical projects stalled.

As the company explored machine learning to augment its R&D efforts, a fundamental gap quickly emerged. The internal data it had collected was a reasonable starting point, but it wasn't sufficient to support full model development. Every available external source offered only generalized material properties, lacking the specific parameters the team had identified as essential. To make meaningful AI-driven predictions about how a material's properties relate to device performance, the company needed high-quality, precisely targeted training sets that simply didn't exist off the shelf.

The Solution

Training a machine learning model with custom-curated data

The company turned to CAS, whose team reviewed what was readily available in the comprehensive the CAS Content Collection™ and then worked with the company's key stakeholders to scope a custom-curation effort. CAS scientists selected the relevant materials and device properties, drawing from previously indexed content and sourcing additional data from published literature and patents. The result was highly structured, human-curated content designed to capture the nuanced connections within the scientific information — the kind of context that automated extraction alone cannot reliably produce.

Once the model was trained on this focused dataset, it generated precise recommendations covering the substances, process conditions, layout, and assembly of the target devices. The results were compelling enough that additional custom-curated training sets were commissioned to further refine prediction accuracy and transferability. At each stage, the informatics team validated the model's outputs before proceeding. This disciplined approach kept risk low and leadership confidence high by aligning regularly with quarterly reporting cycles.

This systematic methodology dramatically accelerated discovery. What had historically been a slow, intuition-driven process was now producing results roughly twice as fast as comparable projects; the company is on track to be the first to market with the novel, optimized device.

The Outcome

From siloed expertise to scalable, data-driven discovery

Beyond the speed gains, the project fundamentally changed how R&D innovation works at this organization. Knowledge that once lived only in the minds of a few senior scientists is now encoded in a validated, reproducible machine learning framework. The risk of losing critical institutional knowledge when people depart has been reduced, and the path from early-stage idea to testable prediction has been compressed in ways that were impossible with the company's original internal data.

Schedule a free CAS Custom Services℠ consultation