To answer the most vexing questions in science, researchers first require access to the wealth of knowledge that already exists in previously published materials. To paraphrase Isaac Newton: in order to see further, we need to stand on the shoulders of giants. However, the information tools at scientists’ disposal are not always efficient at delivering the relevant insights to do so, hindering researcher’s progress by taking them away from the laboratory.
In today’s digital environment, search engines are the primary access point to information, and for general questions they are usually very effective at producing relevant results. After all, how often do you really need to venture further than the first page of results to find a good recipe or movie review? It has been reported that fewer than 10% of users do. However, the same cannot usually be said for a scientific query, even in specialized scientific search tools.
The traditional focus of scientific search engines has been to recall the most complete set of results possible that relate to the question, regardless of how relevant they were. The most recently published answers have often been prioritized first, rather than those that most closely matched the question, leaving the researcher to trawl through them to find the most relevant results for their needs.
The problem with this approach is that it is time consuming, and research progress can be compromised if key papers and scientific data cannot be found quickly. A scientist’s time is one of an R&D organization’s most valuable resources! However, in a recent user survey, we found that scientific researchers spend on average around 7 hours per week looking for information and that 60% of that time is spent sorting through results to find the needed data. In fact, data shows that less than half of a researcher’s time is actually spent in the laboratory. (See this infographic for more detailed information and insights as to how researchers spend their time.)
Imagine the impact on R&D productivity if researchers were able to redirect even half of the nearly one full day per week they spend finding information back into the laboratory to advance discovery? Addressing this need to find highly relevant information fast was the key driver for our team in developing SciFindern.
Relevant search results start with an understanding of the science
To design a scientific search engine capable of retrieving and presenting the most useful results, a firm understanding of the nature of relevance is needed. But what makes a search result relevant? As it turns out, this is a deceptively tricky question.
First, it depends on who you ask. Science is multifaceted, so researchers from different fields may approach a topic from many angles. For a single query, a result that is highly relevant to a medicinal chemist might not be appropriate for a process chemist. Furthermore, other user-specific factors and preferences, such as date of publication, geographic origin and original publication language may be important to consider.
Second, it depends on how one weights the wide variety of factors that could impact relevance. For example, does the title of the paper match the search question? Has a paper been cited multiple times by others? Does a paper contain synonyms for the search question? Is the paper from a reputable source? How long ago was it published? These are examples of syntactic, semantic and metadata clues that must be considered when deciding if a result is relevant.
Building relevance into scientific search engines
To produce the most relevant results, a scientific search solution must accommodate the needs of each individual user, the diversity of which must be considered when designing search algorithms.
There is no magic recipe behind creating a scientific search tool that gets relevance right, but there is a set of core principles. First, scientific search engine developers must really understand user needs and the difference between sorting and ranking results.
Second, the high-value information available needs to be fully leveraged. For SciFindern, that meant fully incorporating all of the human-curated data elements present in the CAS content collection, such as reactions, chemical structures, properties and specific roles, and key concepts that detail the subject matter discussed in each paper.
Third, each answer must be considered in view of all other possible answers and how it interconnects with the rest of the available scientific knowledge. This is essential when assessing how much an answer does and does not match the question. Finally, the algorithm needs to appropriate balance these key parameters so the correct and most relevant information is selected and presented to the user.
These core principles, combined with the latest developments in user analytics and information retrieval, form a good basis for a relevance-optimized scientific information tool, but there is another factor to consider – the user experience. We could design the most powerful scientific search engine in the world, but if it is not user friendly and intuitive, only very limited efficiencies could be gained. A key part of usability is allowing user to form their question in the same way that they think about the problem. For example, many chemists naturally frame their questions with both textual and chemical structure elements combined. An efficient solution needs to support this, even though these data types are processed very differently.
The two-pronged approach of accommodating both user needs and the software algorithm has a synergetic effect on scientific search engines. The outcome is that relevant data can be found quicker than before, and scientists can spend more time conducting research and less time searching for information. When multiplied across all of the scientists within an organization, the potential for increasing research productivity is staggering. For this reason, ensuring that researchers have the most efficient solutions available to search published content and internal data repositories can deliver a strategic advantage to your organization.
For more information on how CAS is combining unmatched expertise and advanced algorithms for scientific search with human content curation to revolutionize scientific information retrieval, watch this video on SciFindern – the most advanced scientific search engine today.