Healthcare organizations increasingly rely on data to inform strategic decisions. This growing dependence makes ensuring data across the organization is fit for purpose more critical than ever. Decision-making challenges associated with pandemic-driven urgency, variety of data, and lack of resources have further highlighted the critical importance of healthcare data quality and prompted more focus and investment. However, many data quality initiatives are too narrow in focus and reactive in nature or take longer than expected to demonstrate value. This leaves organizations unprepared for future events, like COVID-19, that require a rapid enterprise-wide analytic response.
What are some actionable ways you can help your organization guard against the data quality challenges uncovered this past year and better prepare to respond in the future? Join Taylor Larsen, Director of Data Quality for Health Catalyst, to learn more.
What You’ll Learn
- How data profiling and data quality assessments, in combination with your data catalog, can increase data quality transparency, expedite root cause analysis, and close data quality monitoring gaps.
- How to leverage AI to reduce data quality monitoring configuration and maintenance time and improve accuracy.
- How defining data quality based on its measurable utility (i.e., data represents information that supports better decisions) can provide a scalable way to ensure data are fit for purpose and avoid cost outstripping return.
2. Introduction
Taylor
Larsen
Health Catalyst :: Data Business Unit :: Director of Data Quality
Economist > CO Medicaid > Analytic Engineer > Data Scientist > Data Quality
Our focus is on improving time to value through data quality.
How quickly can our clients begin making
better data informed decisions?
How can we:
• More proactively ensure data quality from the start?
• Help our clients feel more confident?
• Be more reliable, day after day?
• Do this consistently across our products and clients?
3. https://www.gartner.com/en/documents/3986583/cost-optimization-is-crucial-for-modern-data-management-
Why is it so costly?
• Lost revenue
• Bad business decisions
• Duplicate efforts
• Employee turnover
The cost is amplified in healthcare:
• Patient safety
• Care management
• Patient experience
• Provider burnout
Problem
A recent Gartner survey found that organizations estimate the
“average cost of poor data quality at $12.8 million per year.”
Poor data
quality:
Missing, incorrect, or
otherwise flawed
data that cannot
serve its purpose
4. Utility
Content:
Multiple
Subject Areas
Content:
Single
Subject Area
Structural
Four Levels of Data Quality
Database constraints are
enforced including data
types, NULLs, primary keys,
and referential integrity.
Values are reasonable within
context of subject area.
Values are reasonable across
subject areas.
Values represent information
empirically demonstrated to
support better decisions.
“There is no Feb. 30 and encounter ID
is unique across encounters.”
“Respiratory rate is not negative,
and temperature makes sense
given the unit of measure.”
“Heart rate is appropriate for child
vs adult and flow sheet recorded
date is during encounter.”
“Respiratory rate predicts short
term mortality and drives inpatient
admission decision.”
5. What is your organization’s process for ensuring data quality?
a) Business users notify engineers or IT if they find a problem – 17%
b) Business users review before analysis is considered “done” – 13%
c) We review during design on an ongoing basis – 21%
d) Our process varies on a project-by-project basis – 49%
Poll Question #1
6. Data quality initiatives:
• Are too narrow in focus.
• Are reactive in nature.
• Take longer than expected.
Problem
This is not a strategy!
Many organizations are investing in data quality but are still unprepared
for future events that require a rapid enterprise-wide analytic response.
7. Agenda
Increase Data Quality Transparency
How data profiling and data quality assessments, in
combination with your data catalog, can increase data quality
transparency, expedite root cause analysis, and close data
quality monitoring gaps.
Reduce Configuration and Maintenance
How to leverage AI to reduce data quality monitoring
configuration and maintenance time and improve accuracy.
Data Quality Defined by Measurable Utility
How defining data quality based on its measurable utility (i.e.,
data represents information that supports better decisions)
can provide a scalable way to ensure data are fit for purpose
and avoid cost outstripping return.
8. Increase Data Quality Transparency
Standard data profiles increase understanding of the structure and content of tables and columns.
Lab Results Table
Result Date Column
9. Increase Data Quality Transparency
Data quality assessments further characterize the data, surface hidden issues and unexpected
changes, codify data knowledge, communicate expectations, and make explicit what is being checked.
Result Date Column
10. Increase Data Quality Transparency
Data catalogs provide a centralized index of metadata that accelerates exploration of data
assets and associated information about structure, content, relationships, and quality.
Lab Results Table
Result Date Column
11. Which of the following features are available in your organization’s data
catalog?
Check all that apply:
a) Multi-level navigation of data assets like tables and fields – 40%
b) Standard data profiles – 44%
c) Data quality assessment results – 32%
d) Ability to define new data quality assessments – 26%
e) We don’t have a data catalog that I’m aware of – 45%
Poll Question #2
12. How has your organization approached developing data quality assessments?
a) We are just getting started and haven’t developed many yet – 31%
b) We develop them one at a time and maintain them separately – 29%
c) The tools or approach that we use are scalable and automated – 25%
d) I’m still not sure what you mean by data quality assessments – 15%
Poll Question #3
13. Reduce Configuration and Maintenance
We can instead feed any
number of data quality
indicators to AI and allow it to
determine expected ranges and
alert us when something
unexpected happens.
Monitoring for unexpected
changes in data quality using
manual thresholds quickly
becomes time and cost
prohibitive and can produce too
many false alerts.
Data quality assessments require an assertion about the valid or expected state of the
data, which is then compared against a threshold to determine if there is an issue.
14. • Objectivity: Based on evidence or experience – we expect
the quantifiable relationship between some predictor
variable (e.g., age) and a known outcome (e.g., readmission)
to be within a given range… when it’s not, we know
something is off.
• Scalability: We can allow the computer to automatically
quantify objective relationships and then determine
whether they’re within expected ranges which eliminates
the need to manually define data quality checks or
thresholds.
• Prioritization: We can use measurable utility to prioritize
where to focus our data quality improvement efforts
because it highlights impactful data elements that are not
performing as expected due to data quality issues (and likely
for reasons we couldn’t spot with other approaches).
Data Quality Defined by Measurable Utility
How does measurable utility help us?
15. Root Cause:
1. Readmissions were not correctly
attributed to index admissions
2. ED admits were misclassified
Not helpful in combination
with other data
More predictive than expected
Within expected range
Relatively helpful in
combination with other data
Less predictive than expected
Data Quality Defined by Measurable Utility
What does it look like in practice?
Root Cause:
1. Readmissions were not correctly
attributed to index admissions
2. ED admits were misclassified
16. Data Quality Defined by Measurable Utility
What does it look like practice?
We can then use AI to detect
unexpected changes in
measurable utility
17. To reduce the risk of your data quality initiative being too narrow in focus,
reactive in nature, and taking longer than expected, look for opportunities to:
1. Increase data quality transparency.
• Use standard data profiles, data quality assessments, and your data catalog to
help make sure that you understand the data and have good testing coverage.
2. Reduce configuration and maintenance.
• Leverage AI to reduce manual configuration and maintenance of testing
thresholds and to alert you to unexpected changes sooner and more accurately.
3. Objectively prioritize data that facilitate better decisions.
• Define data quality based on its measurable utility being within an expected
range to objectively validate key data elements in a scalable way.
Summary
18. If you’d like to learn more about Health Catalyst products or services, please
answer this question:
A. Yes, I’d like to learn more.
B. No, thank you.
Final Poll Question