How to improve trust in advanced analytics, AI, and machine learning
With the volume, velocity, and variety of data coming into the enterprise, IT teams are turning to artificial intelligence and machine learning to improve the efficiency and accuracy of their data management processes. But if you have underlying data integrity challenges, and you’re using that faulty data to train your machine learning algorithms, your machine learning is now fueled by faulty data. How does that impact your business decisions?
View this on-demand webinar with Dr. Tendü Yoğurtçu, Precisely CTO, for this informative discussion where she will examine various use cases for machine learning and advanced analytics. We will also explore the root causes of data integrity challenges, including:
- Poor data quality
- Data silos
- Lack of context that enriches the understanding of your data
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Do You Trust Your Machine Learning Outcomes?
1. Do You Trust Your
Machine Learning
Outcomes?
How to improve trust in advanced
analytics, AI, and machine learning
Dr. Tendü Yoğurtçu | Chief Technology Officer, Precisely
2. Housekeeping
Webinar Audio
• Today’s webinar audio is streamed through your computer
speakers
• If you need technical assistance with the web interface
or audio, please reach out to us using the Q&A box
Questions Welcome
• Submit your questions at any time during the presentation
using the Q&A box
Recording and slides
• This webinar is being recorded. You will receive an email
following the webinar with a link to the recording and slides
3. Agenda
• Trends in data and the growth of AI
• Common industry use cases for
machine learning & data challenges
• Real-world stories of ML success
• Strategies for improving trust in ML
outcomes
Today’s Speaker
Tendü Yoğurtçu, PhD
Chief Technology Officer, Precisely
4. Source: IDC, Worldwide Global DataSphere Forecast, 2020–2024
The rising tide of data
HYBRID CLOUD
ARTIFICIAL INTELLIGENCE
DATA GOVERNANCE
Data created in 2019
New data created in 2019 in real time
Data will be created by 2024
45Zb
19%
143Zb
61% New data created on endpoints
New data created in the cloud
20%
of the 45 Zb is generated by replication
and distribution, creating data liabilities
88%
DATA STREAMING
5. Why AI and ML?
AUTOMATE
• Automate workflows,
common processes,
and decision making
SCALE
Scale processing
across massive
volumes of data
PREDICT
Predict outcomes and
recommend actions to
support business
planning
COMPETE
Obtain competitive
advantage through
greater insight and
operational efficiency
ML can also be applied to improve the accuracy and consistency
of data you use for business processes.
5
6. Hybrid Cloud
68%
of organizations said
disparate data
negatively impacted
their organization2
Streaming
92%
of firms agree
they need to
increase use of
outside data5
Location
47%
of newly created
data records have
at least one
critical error3
AI
54%
of enterprises
challenged by lack
of data location
intelligence4
of CEOs are concerned about the integrity
of the data they’re basing decisions on1
Sources: 1. Forbes, 2. Data Trends Survey 2019, 3. Harvard Business Review, 4. IDC, 5. Forrester
84%
8. Real-world example
Business challenge: could not
capitalize on demographic and
experience trends
Technical challenge: data
scientists spent weeks on getting
clean, consolidated data to feed
AI initiatives
Solution: Using ML powered
entity resolution led to more
accurate results in less than 4
hours rather than 4+ weeks
Insurance and ML
8
Business challenges
• Making pricing policy decisions
• Analyzing risk
• Assessing business impact as a
catastrophe develops
• Optimally allocating resources
after an event occurs
• Growing business through
highly-targeted marketing
programs for new and existing
policyholders
Data challenges
• Entity resolution at scale
• Lack of access to siloed data
• Inconsistency of data across
multiple sources
• Freshness of third-party data
for understanding risks
associated with weather and
natural disasters
9. Business Challenge: More quickly
and accurately predict a
property's market value
Technical Challenge: Joining
thousands of variables from
disparate sources and ensuring
data accuracy & consistency for
predictive ML models
Solution: Cloud-native location
intelligence with curated datasets
reduced time to build trusted
data from 13+ hours to 3.2 hours
Real-world example
Banking & loans and ML
9
Business challenges
• Reducing risk by understanding
variables that most impact
home valuation
• Informing loan activity by
producing scores for mortgage
bankers
• Making intelligent, risk-based
decision using standardized
location information
• Growing new business and
expanding current business
with highly-targeted marketing
programs
Data challenges
• Incomplete data
• Verifying accuracy &
standardizing the data
• Linking 3rd-party data to
customer reference sets
• Marrying location information
from multiple sources; e.g.,
satellite, drone map/plot info
10. Business challenge: analyze
global business trends to help
investors make sound decisions
Technical challenge: data
scientists needed to accurately
join datasets from various sources
to feed trusted data into ML
models
Solution: 30 data scientists in AI
lab geocoded and enriched their
data with PreciselyID and Points
of Interest to improve trust in the
models they were building
Real-world example
Financial services and ML
10
Business challenges
• Processing millions of data
points for risk & AML analysis
• Improving the accuracy of real-
time approvals & reducing the
number of false rejections
• Increasing profitability by
mining customer data for better
insights
• Helping investors make sound
decisions by analyzing global
business trends
Data challenges
• Standardizing data coming
from different sources
• Verifying accuracy of the data
• Feeding data to ML models
with maximum accuracy and
consistency
• Enriching in-house data with
accurate third-party data to
feed models and provide lift
11. Business challenge: rising
marketing costs and a poor
customer experience due to
duplicate customer records
Technical challenge: data was
siloed, and duplicate data records
prevented single view of customer.
Solution: deployed a context
graph and ML-powered
Customer 360 solution for a
trusted, unified view of its
customers; reduced deduplication
time from 3 hours to under 5 mins
Real-world example
Retail and ML
11
Business challenges
• Understanding consumer patterns
• Predicting retail growth at scale
• Delivering a personalized
customer experience that
maximizes customer loyalty
• Performing site planning
Data challenges
• Siloed data
• Data standardization and
validation
• Duplicate customer information
across CRM and ERP systems –
and time required to de-dup
large quantities of data
• Obtaining a single view of a
customer’s data
13. Improve trust in your data
to improve trust in your ML outcomes
INTEGRATE
Break down data silos
to bring all your
enterprise data to your
ML models
VERIFY
Ensure the data used
to build, train, & feed
ML models is
accurate & consistent
LOCATE
Apply the consistent
element of location to
organize, manage, &
enrich your data for
greater insights
ENRICH
Enrich your data with
expertly curated, up-to-
date consumer insights,
business, and
demographic
information
Trust your data. Build your possibilities.
13
14. The Precisely Data Integrity Suite
• Delivers the essential elements of data integrity –
accuracy, consistency, and context
• Built on data integration, data quality, location
intelligence, and data enrichment trusted by over
12,000 enterprise customers
• Modular architecture allows you to choose just the
capabilities you need – and implement them
alongside your current infrastructure at scale
• Empowers faster, confident decision-making
with trusted data
Data
Integration
Data
Enrichment
Location
Intelligence
Data
Quality
Users of real-time and streaming data architectures increasingly realize that real-time data quality is an operational concern
Need to automate decision making
Need to scale
Need to predict
Need to plan
Need a competitive advantage
Rise of the use of AI to improve existing data pipelines and processes – such as smart rules, automatic data classification, and intelligent automated rule application
Rise of Data Quality for AI and the emergence of MLOps - the need for “good data” instead of just “big data”
Call out data challenges associated with each of these statistics:
Cannot build real-time data pipelines to feed business applications and analytics
Time consuming and manual effort to standardize, verify, and validate data across entities
Difficult to make addresses data fit for purpose – this requires significant expertise, time, and resources
Manually tracking and incorporating up-to-date location, business, and demographic information
Leverage hyper-accurate geocoding to inform pricing policy decisions and risk
This can speak to risk that exists due to a policy location and variables that could cause a claim such as flooding, hurricanes, or wildfires
But is can also speak to adjacent risk. Understanding if there is a nearby business that could cause a problem (what if they are near a fireworks store?), or if you have too many policies located in a single building, such as a high-rise building, which could cause a large loss on many policies if an event happened, such as a fire.
Business Challenge: improve speed and accuracy of valuation models by joining thousands of variables to effectively predict a property’s market value.
Technical Challenge: Connecting volumes of data from disparate sources and ensuring a consistent and accurate approach to feed trusted data into ML models
Solution: Deployed cloud-native location intelligence with expertly curated datasets to connect and build trusted data feeding ML to predict property market values. Resulted in building trusted data in 3.2 hours (and getting faster all the time!) reduced from 13+ hours
Right data is connected to the right property, trusted data
Other say they do that but don’t do well, false positives – we do better matching, accurate location, make it easier to use with preciselyID and super fast
When I say, “build trusted data,” I am referring to the processing to join/bring the data together that will be used to feed the product value ML model.
Unlike the traditional methods, ML can analyze significant volumes of personal information to reduce their risk.
AI workflows can analyze data sources like consumer mobility and purchase pattern
Location: not just about enriching, we enrich it correctly so people do not get false positives, false information
Importance of data being done correctly, building trust is critical
And that is why Precisely has introduced the Precisely Data Integrity Suite.
It delivers the essential elements of data integrity – accuracy, consistency, and content – to give your business the confidence to make better, faster decisions based on trusted data.
Built on proven data integration, data quality, location intelligence, and data enrichment capabilities trusted by more than 12,000 global organizations, the Precisely Data Integrity Suite delivers unmatched value for any data integrity initiative.
And with a modular architecture, you can pick just the capabilities you need, implement them alongside your current infrastructure, and add-on new capabilities as your needs grow.