This document discusses how CAS maximizes the value of scientific information to accelerate innovation. It describes CAS's history in developing technologies for storing and searching chemical information. CAS scientists curate data by extracting, connecting, and providing context for published scientific information. CAS uses knowledge graphs to leverage this high-quality data for unique insights like literature discovery, prior art search, and decision support. The document emphasizes that CAS's unparalleled scientific content collection and human expertise are crucial for transforming raw data into actionable insights.
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
MAXIMIZING THE VALUE OF SCIENTIFIC INFORMATION TO ACCELERATE INNOVATION
1. Copyright 2020, American Chemical Society.
All rights reserved.
Copyright 2020, American Chemical Society.
All rights reserved.
Mark R. Grabau
Chief Analytics Officer, CAS
MAXIMIZING THE VALUE OF
SCIENTIFIC INFORMATION TO
ACCELERATE INNOVATION
2. Copyright 2020, American Chemical Society.
All rights reserved.
Mark R. Grabau – Chief Analytics Officer
• 28+ year professional background in Advanced Analytics/Data
Science across several industries
• Previous Employers
• Publications
– 16 peer-reviewed papers on the application of
advanced analytics to real-world business problems
• Education
• Patents
– Systems and Methods for Performing a Computer-
Implemented Prior Art Search
– Adaptive Forecasting of Time Series
– Data Collection for Usage-Based Insurance
• Awards
– 2019 George E. P. Box Excellence in Advanced
Analytics Award
– 2013 Runner-up for the Institute for Operations
Research and the Management Sciences Innovation
in Analytics Award
– 2000 Department of Defense and United States Air
Force Awards for Modeling and Simulation
Retail Telecom Travel Utilities
Healthcare Insurance
Industrial
Chemicals
GovernmentBanking
Financial
MarketsDefense
3. Copyright 2020, American Chemical Society.
All rights reserved.
CAS IS A DIVISION OF THE AMERICAN CHEMICAL
SOCIETY
The American Chemical Society (ACS) is a not-for-profit organization chartered by the United
States Congress for the purposes of:
• encouraging the advancement of chemistry
• promoting research in chemical science and industry
• increasing and diffusing chemical knowledge
• promoting scientific interests and inquiry
ACS is the world’s largest scientific society with more than 150,000 members in 140+
countries
4. Copyright 2020, American Chemical Society.
All rights reserved.
We provide products and services that power discovery to solve our world's biggest
challenges by helping organizations plan, innovate and protect their innovations
With over 110 years of experience, no one knows more about scientific information
and related technology than CAS
CAS IS A SPECIALIST IN SCIENTIFIC INFORMATION
SOLUTIONS
SPECIALIZED
TECHNOLOGY
UNPARALLELED
SCIENTIFIC CONTENT
UNMATCHED
HUMAN EXPERTISE
5. Copyright 2020, American Chemical Society.
All rights reserved.
CAS HAS BEEN A LEADER IN SCIENTIFIC
INFORMATION TECHNOLOGY SINCE THE BEGINNING
• We were “big data” before big data was
popular
• CAS developed many of the fundamental
technologies for storing and searching
chemical information
• CAS solutions, STN and SciFinder,
revolutionized access to published
information for R&D
6. Copyright 2020, American Chemical Society.
All rights reserved.
CAS REGISTRY LED THE WAY IN BIG CHEMISTRY
DATA
• Early researchers, regulators and safety managers
struggled to clearly identify and differentiate chemical
substances
• An algorithm was developed to turn chemical structures
into a unique tabular form that could be databased
• CAS scientists pioneered the concept of using unique
identifying numbers for each chemical
• Launched in 1965, CAS Registry Numbers are now the
global standard for chemical identification
CAS Registry Number
1298016-92-8
7. Copyright 2020, American Chemical Society.
All rights reserved.
WE ARE A TRUSTED PARTNER POWERING
INNOVATION ACROSS INDUSTRIES
8. Copyright 2020, American Chemical Society.
All rights reserved.
>$1B Lost Market Leadership
MARKET SHARE LAG >6% 1
>$31M R&D Spend Inefficiency
TIME LOSS: >3MOs 2
>$4B Reputational Risk
LOSS OF EXCLUSIVITY: >6 YRs 3
1. https://www.mckinsey.com/industries/pharmaceuticals-and-medical-
products/our-insights/pharmas-first-to-market-advantage
2. Drug Discovery World Fall 2004, Failure rates in drug discovery and
development: will we ever get any better?
3. https://www.hhrjournal.org/2017/11/patent-fighters-taking-on-big-
pharma
THE COST OF MISSING RELEVANT INFORMATION IS
HIGH FOR INNOVATION-FOCUSED ORGANIZATIONS
INNOVATION
RISK
PRODUCTIVITY
Copyright 2020, American Chemical Society.
All rights reserved.
9. Copyright 2020, American Chemical Society.
All rights reserved.
THE VOLUME AND COMPLEXITY OF THE SCIENTIFIC
DATA LANDSCAPE IS GROWING EXPONENTIALLY
10. Copyright 2020, American Chemical Society.
All rights reserved.
Data Information Knowledg
e
Insight
Data
Governance
Indexing &
Linking
Search &
Analytics
Actionable
Insights
Action
DATA MUST BE TRANSFORMED INTO ACTIONABLE
INSIGHTS BEFORE IT CAN ADD VALUE TO R&D
11. Copyright 2020, American Chemical Society.
All rights reserved.
Semantically connected titles, abstracts and
claims
Substances, reactions, sequences
and properties connected across
pubs
Key concepts and inventions
globally translated and
indexed
ROBUST, HIGH-QUALITY DATA REQUIRE IN-DEPTH
CURATION BY SCIENTISTS WITH EXPERTISE IN THE
FIELD AND THE LANGUAGE
12. Copyright 2020, American Chemical Society.
All rights reserved.
Patent Number: WO 2012135049 A1
Title: Compounds and Methods for Chemical
and Chemo-Enzymatic Synthesis of
Complex Glycans
- 7 Concepts
- 138 Substances
- 4,614 Reactions
- 4 Patent Family Members
- 3 Cited Documents
- 1 Markush Structure
CAS CAPTURES CRITICAL DETAILS THAT ALGORITHMS
ALONE MISS
13. Copyright 2020, American Chemical Society.
All rights reserved.
THE CAS CONTENT
COLLECTION IS
UNPARALLELED IN
BREADTH, DEPTH AND
QUALITY
14. Copyright 2020, American Chemical Society.
All rights reserved.
CAS SCIENTISTS EXTRACT, CONNECT AND
PROVIDE CONTEXT FOR PUBLISHED DATA
Global Translation
Document Indexing
Lexicon Development
Semantic Indexing
Reaction Indexing
Markush Indexing
15. Copyright 2020, American Chemical Society.
All rights reserved.
CAS TECHNOLOGISTS STRUCTURE THAT DATA
AND BUILD SOLUTIONS TO DELIVER INSIGHT
Data Modeling
Search Architecture
Application Development
Analytics
Global Translation
Document Indexing
Lexicon Development
Semantic Indexing
Reaction Indexing
Markush Indexing
16. Copyright 2020, American Chemical Society.
All rights reserved.
WE CONTINUE TO EVOLVE HOW WE PROCESS, MANAGE
AND DELIVER DATA TO MEET EMERGING NEEDS
15
Index size: 7.5 TB
Knowledge Graph >1.4 Billion Nodes
17. Copyright 2020, American Chemical Society.
All rights reserved.
WHAT IS THE MEASURABLE IMPACT OF CLEAN,
HUMAN-CURATED DATA ON PREDICTIVE OUTCOMES?
CHALLENGE: A recently published paper classified almost 10,000 chemical entities on predicted
biological activity to five different targets using Morgan fingerprints using a support vector machine
model
OUR QUESTION: Does substituting Morgan fingerprints with CAS proprietary fingerprints have a
measureable impact on prediction accuracy?
RESULTS:
70% 67%
61%
65%
60%
93% 87% 85%
94%
66%
0%
25%
50%
75%
100%
MOR1 5-HT2B ADRA2A Histamine H1 hERG
Morgan CAS
The classification
accuracy increased
by over 30% when
using higher-quality
CAS data
18. Copyright 2020, American Chemical Society.
All rights reserved.
CAS USES KNOWLEDGE GRAPHS TO LEVERAGE
THIS DATA FOR UNIQUE INSIGHTS
Outcomes:
Approach: Text-based
knowledge graphs
Literature Discovery
AI + knowledge
graphs
Prior Art Search
IAP + real-time
analytics
Decision Support
19. Copyright 2020, American Chemical Society.
All rights reserved.
LITERATURE BASED DISCOVERY VIA
KNOWLEDGE GRAPHS ALLOWS FOR OPEN
EXPLORATIONLiterature
Discovery
Begin with a known curated
concept
Source Reference
Exploration
Explore other related
concepts extracted
from content
Concept and
Substance
Details
20. Copyright 2020, American Chemical Society.
All rights reserved.
AND CLOSED DISCOVERY
Begin by choosing several concepts
Source Reference
Exploration
Find meaningful
connections
Concept and Substance
Details
Literature
Discovery
21. Copyright 2020, American Chemical Society.
All rights reserved.
PATENTS HAVE INHERENT CHALLENGES
COMPARED TO TYPICAL TEXT INGESTION AND
SIMILARITY SEARCHES
Syntactic Similarity
vs
Semantic Similarity
Similarity
vs
Prior-Art
Office
Inventor
Abstract
Description
Claim 1
Claim 2
Claim n
Similarity High Low Low
Prior Art? No Yes Yes
Prior Art
Search
22. Copyright 2020, American Chemical Society.
All rights reserved.
VISUAL REPRESENTATION OF SEARCH
REPORTS SUPPORT FASTER PATENT ANALYSIS
AND EXAMINATIONPrior Art
Search
23. Copyright 2020, American Chemical Society.
All rights reserved.
CAN EXTEND BEYOND SEARCH REPORTS TO
INCLUDE OTHER CONNECTED PATENTS AND
NON-PATENT LITERATUREPrior Art
Search
24. Copyright 2020, American Chemical Society.
All rights reserved.
Copyright 2020, American Chemical Society.
All rights reserved.
CONNECT
www.cas.org/resources
WHITE
PAPERS
BLOGS
CASE
STUDIES
EMAIL
Mark R. Grabau
mgrabau@cas.org