SlideShare une entreprise Scribd logo
1  sur  37
A Better Architecture for Data:
Adaptable, Scalable, and Smart
Paul Boal &
Adam Doyle
June 8, 2018ST LOUIS
Agenda
1. Modern Data Architecture Myths
2. Characteristics of Modern Data Architecture
a. Governed, Secure
b. Adaptable, Customer Centric, Collaborative
c. Flexible, Elastic, Simple, Resilient
d. Smart, Automated
3. Reference Data Architecture
4. How do I get there?
5. Recap
2
Myths
3
MYTH #1
A modern data architecture is
not a single technology or single
vendor solution.
Modern data architectures
combine a portfolio of
technologies to create an
ecosystem with certain
characteristics.
Just install
Hadoop
4
MYTH #2
NoSQL technologies provide an
efficient way to manage and
access data under certain
circumstances, but traditional
relational databases and SQL
continue to provide the most
powerful way to organize and
query well-known data.
Modern must
mean NoSQL
5
MYTH #3
We talk a lot about the
accelerating growth of data, the
decreasing cost of storage and
compute power, and the power
of data science. It's convenient
to believe that throwing all of
this into a pot and simmering
will produce results while we
wait. The truth is that applying
data, technology, and analytics
still requires planning, analysis,
and careful execution.
Big data is
magical pixie
dust
6
MYTH #4
Not all data is created equal.
Sometimes you might have
unreliable or invalid data that
will obfuscate results if used
inappropriately.
Using extraneous data can
make analysis more
complicated by adding time to
filter the data set and select
features. Sometimes more just
means more work.
More data is
always better
7
MYTH #5
One of the characteristics of a
modern data architecture is
flexibility, meaning that your
modernization should be
developed incrementally,
implementing new capabilities
in a way that integrates with
and slowly supplants existing
limited technologies.
I have to
replace
everything I
have right now
8
Characteristics
9
Governed, Secure
10
Governed,
11
The architecture and its
components have to evolve and
adapt in ways that are intentional
and informed by enterprise
strategy.
Make collaboration the default.
Communicate and then
communicate some more.
Treat every component as if another
team may want to use it, too.
Accessing information should be
easy and should effortlessly ensure
that users are knowingly using the
right information for the right
purpose.
Security as an enabler of usage, not a
denier of access.
Track and log access for audit
purposes and for learning.
Secure
ING
Apache Atlas
Open Metadata and
Governance - APIs,
notification systems,
integration of metadata,
security, and governance
related tools
12
Governed, Secure
https://www.slideshare.net/Hadoop_Summit/open-metadata-and-governance-with-apache-atlas?qid=6ea30d4f-15af-46ad-b580-349f78bb7752&v=&b=&from_search=9
Frameworks and Tools
Open Source Core
Apache Atlas - Open Metadata Management
Apache NiFi - Data Provenance
Apache Sentry/Ranger - Fine-grained Access
Control
13
Governed, Secure
Vendor Participants
Adaptable, Customer Centric, Collaborative
It is not the strongest of the species that
survives, nor the most intelligent. It is the
one that is most adaptable to change.
~Charles Darwin
14
Adaptable,
15
The more you deliver, the
more you will learn about
what is really needed, so
be prepared to change and
build solutions that can
change easily.
Agile data modeling.
Agile analytics.
Focus on delivering solutions
that make sense to the people
who will use them rather than
following standards and rules
above all else.
The DBMS is not your user.
Ralph Kimball and Edgar Codd
are not your users.
The Architecture Review Board
is not your user.
Customer Centric,
Solutions that are interactively
designed and built by a team with
diverse capabilities and backgrounds
can produce a result better than what
any one individual would have done .
Collaboration is more than
requirements gathering.
Collaboration is something that has to
happen every day.
Communicate, communicate,
communicate. And then communicate.
Collaborative
Agile Data
16
Adaptable, Customer Centric, Collaborative
http://agiledata.org/
Tools and Techniques
Model Storming
Rapid experimentation
Data science environments
Wherescape, Snowflake, ThoughtSpot
17
Adaptable, Customer Centric, Collaborative
Simple, Elastic, Resilient, Flexible
Notice that the stiffest tree is
most easily cracked while the
bamboo or willow survives by
bending with the wind.
-Bruce Lee
18
Simple,
19
Individual
components should
only be as complex as
necessary.
Reduce inter-
dependencies.
Use shared
components.
The system can easily
had an increase in
data volume, users,
or complexity.
Distributed computing.
Cloud.
DevOps.
Errors in data or
processing don't
cause large parts of
the system to fail.
Isolate components.
Tolerate, isolate, and
report bad data.
Change to the system
is easy to
accommodate and
doesn't break other
components.
Microservices.
Versioned interfaces.
Backward
compatibility.
Elastic, Resilient, Flexible
EarEcstasy
20
Data staging and
Data Lake only
contain needed data.
Each data pipeline is
only as complex as it
needs to be to deliver
on a narrow scope.
Data is only
integrated as
needed, keeping
processes simple.
Simple, Elastic, Resilient, Flexible
https://www.slideshare.net/AmazonWebServices/aws-summit-singapore-get-to-know-your-customers-modern-data-architecture-93784711
Tools and Technologies
21
Cloud-based Infrastructure
Cloud-native Services
DevOps
Containers
Open Source
Simple, Elastic, Resilient, Flexible
Automated, Smart
22
I'm afraid I can't make
that into a star schema,
Dave.
We are going through the process where
software will automate software, automation
will automate automation.
-Mark Cuban
Automated,
23
Automate tasks needed to optimize
the function of the system, to
detect significant changes, and to
alert users when attention is
needed.
Metadata injection.
Schema change detection.
Anomaly detection.
Alerting
Schema detection. Self-tuning
databases. Jeopardy champion.
Data shaping, data quality
recommendations.
Natural Language Processing.
Machine Learning.
Recommender systems.
Deep Learning.
Smart
EXAMPLE
83%
reduction in
workload
matching
complex,
low quality
data with
contextual
analysis
24
Automated, Smart
TOOLS
Integrated Machine Learning
Integrated Search
Intelligent Data Classification
Natural Language Processing
25
Automated, Smart
Reference Architecture
26
Modern Data Architecture
27
Everything should be made as
simple as possible, but not simpler.
- A. Einstein
Next steps
29
How do I get there from here?
30
Start with something you understand well from a business perspective.
Select specific, valuable, measurable business cases.
Add simple machine learning use cases.
Identify use cases to move from a batch processing system to a streaming solution.
Recap
31
The Myths are Just Myths
32
● You don't "just need Hadoop" -
You may not even need Hadoop at all!
● NoSQL has a place, but that isn't the entire solution either.
● There's no magical pixie dust here.
This transformation will take real work.
● More data is not necessarily better -
no matter how much we data hoarders want it to be.
● By definition, you have to incrementally create your modern data
architecture, because it also has to continue to evolve.
Governed, Secure
33
Maintain data and the data architecture in
a way that makes governance and security
a natural and easy part of doing work.
Adaptable, Customer Centric, Collaborative
34
Apply data toward real
challenges and opportunities that
focus on customers and be willing
and able to pivot as needed.
Simple, Elastic, Resilient, Flexible
35
Build your data architecture, your teams,
and your processes in a way that creates a
high capacity for change.
Automated, Smart
36
Create systems that can do more of
the work of ingestion, storage, and
integration without your intervention.
Thank You!
37

Contenu connexe

Tendances

Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo
 
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Denodo
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeData Science Thailand
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lakeCapgemini
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?Health Catalyst
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI StrategyAtScale
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data LakeCaserta
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...Vasu S
 
Regulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseRegulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseDenodo
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 

Tendances (20)

Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
 
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
 
Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lake
 
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?CTO Perspectives: What's Next for Data Management and Healthcare?
CTO Perspectives: What's Next for Data Management and Healthcare?
 
The new EDW
The new EDWThe new EDW
The new EDW
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Creating an Enterprise AI Strategy
Creating an Enterprise AI StrategyCreating an Enterprise AI Strategy
Creating an Enterprise AI Strategy
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
 
Regulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven EnterpriseRegulation and Compliance in the Data Driven Enterprise
Regulation and Compliance in the Data Driven Enterprise
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 

Similaire à Better Architecture for Data: Adaptable, Scalable, and Smart

How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanLuke Caratan
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITandreas kuncoro
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)Denodo
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...YogeshIJTSRD
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Denodo
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesDenodo
 
Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Denodo
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Denodo
 
Internet of Things (IoT) is a King, Big data is a Queen and Cloud is a Palace
Internet of Things (IoT) is a King, Big data is a Queen and Cloud is a PalaceInternet of Things (IoT) is a King, Big data is a Queen and Cloud is a Palace
Internet of Things (IoT) is a King, Big data is a Queen and Cloud is a PalaceDr.-Ing Abdur Rahim Biswas
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 

Similaire à Better Architecture for Data: Adaptable, Scalable, and Smart (20)

How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Puts Machine Learning into Production (APAC)
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Data Analytics.pptx
Data Analytics.pptxData Analytics.pptx
Data Analytics.pptx
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data LakesData Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
 
Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)Agile Data Management with Enterprise Data Fabric (ASEAN)
Agile Data Management with Enterprise Data Fabric (ASEAN)
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
Internet of Things (IoT) is a King, Big data is a Queen and Cloud is a Palace
Internet of Things (IoT) is a King, Big data is a Queen and Cloud is a PalaceInternet of Things (IoT) is a King, Big data is a Queen and Cloud is a Palace
Internet of Things (IoT) is a King, Big data is a Queen and Cloud is a Palace
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 

Plus de Paul Boal

Crowdsourcing Data Governance
Crowdsourcing Data GovernanceCrowdsourcing Data Governance
Crowdsourcing Data GovernancePaul Boal
 
Data Analytics Action Figures
Data Analytics Action FiguresData Analytics Action Figures
Data Analytics Action FiguresPaul Boal
 
A Big Data Journey
A Big Data JourneyA Big Data Journey
A Big Data JourneyPaul Boal
 
Taming the Data Tsunami
Taming the Data TsunamiTaming the Data Tsunami
Taming the Data TsunamiPaul Boal
 
Applying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to HealthcareApplying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to HealthcarePaul Boal
 
Why My Wife Loves Data Governance
Why My Wife Loves Data GovernanceWhy My Wife Loves Data Governance
Why My Wife Loves Data GovernancePaul Boal
 
Why You Should Be Using IoT Technologies for More Than Just IoT
Why You Should Be Using IoT Technologies for More Than Just IoTWhy You Should Be Using IoT Technologies for More Than Just IoT
Why You Should Be Using IoT Technologies for More Than Just IoTPaul Boal
 

Plus de Paul Boal (7)

Crowdsourcing Data Governance
Crowdsourcing Data GovernanceCrowdsourcing Data Governance
Crowdsourcing Data Governance
 
Data Analytics Action Figures
Data Analytics Action FiguresData Analytics Action Figures
Data Analytics Action Figures
 
A Big Data Journey
A Big Data JourneyA Big Data Journey
A Big Data Journey
 
Taming the Data Tsunami
Taming the Data TsunamiTaming the Data Tsunami
Taming the Data Tsunami
 
Applying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to HealthcareApplying Big Data Superpowers to Healthcare
Applying Big Data Superpowers to Healthcare
 
Why My Wife Loves Data Governance
Why My Wife Loves Data GovernanceWhy My Wife Loves Data Governance
Why My Wife Loves Data Governance
 
Why You Should Be Using IoT Technologies for More Than Just IoT
Why You Should Be Using IoT Technologies for More Than Just IoTWhy You Should Be Using IoT Technologies for More Than Just IoT
Why You Should Be Using IoT Technologies for More Than Just IoT
 

Dernier

WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfJamesConcepcion7
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxRakhi Bazaar
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...ssuserf63bd7
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Aggregage
 
Interoperability and ecosystems: Assembling the industrial metaverse
Interoperability and ecosystems:  Assembling the industrial metaverseInteroperability and ecosystems:  Assembling the industrial metaverse
Interoperability and ecosystems: Assembling the industrial metaverseSiemens
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers referencessuser2c065e
 
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdftrending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdfMintel Group
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOne Monitar
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxShruti Mittal
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsKnowledgeSeed
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfDanny Diep To
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...Hector Del Castillo, CPM, CPMM
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckHajeJanKamps
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreNZSG
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxappkodes
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Peter Ward
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024Adnet Communications
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxmbikashkanyari
 

Dernier (20)

WSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdfWSMM Technology February.March Newsletter_vF.pdf
WSMM Technology February.March Newsletter_vF.pdf
 
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptxGo for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
Go for Rakhi Bazaar and Pick the Latest Bhaiya Bhabhi Rakhi.pptx
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
 
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
Strategic Project Finance Essentials: A Project Manager’s Guide to Financial ...
 
Interoperability and ecosystems: Assembling the industrial metaverse
Interoperability and ecosystems:  Assembling the industrial metaverseInteroperability and ecosystems:  Assembling the industrial metaverse
Interoperability and ecosystems: Assembling the industrial metaverse
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Excvation Safety for safety officers reference
Excvation Safety for safety officers referenceExcvation Safety for safety officers reference
Excvation Safety for safety officers reference
 
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdftrending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
trending-flavors-and-ingredients-in-salty-snacks-us-2024_Redacted-V2.pdf
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
 
business environment micro environment macro environment.pptx
business environment micro environment macro environment.pptxbusiness environment micro environment macro environment.pptx
business environment micro environment macro environment.pptx
 
Introducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applicationsIntroducing the Analogic framework for business planning applications
Introducing the Analogic framework for business planning applications
 
WAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdfWAM Corporate Presentation April 12 2024.pdf
WAM Corporate Presentation April 12 2024.pdf
 
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdfGUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
GUIDELINES ON USEFUL FORMS IN FREIGHT FORWARDING (F) Danny Diep Toh MBA.pdf
 
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
How Generative AI Is Transforming Your Business | Byond Growth Insights | Apr...
 
Pitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deckPitch Deck Teardown: Xpanceo's $40M Seed deck
Pitch Deck Teardown: Xpanceo's $40M Seed deck
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource Centre
 
Appkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptxAppkodes Tinder Clone Script with Customisable Solutions.pptx
Appkodes Tinder Clone Script with Customisable Solutions.pptx
 
Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...Fordham -How effective decision-making is within the IT department - Analysis...
Fordham -How effective decision-making is within the IT department - Analysis...
 
TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024TriStar Gold Corporate Presentation - April 2024
TriStar Gold Corporate Presentation - April 2024
 
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptxThe-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
The-Ethical-issues-ghhhhhhhhjof-Byjus.pptx
 

Better Architecture for Data: Adaptable, Scalable, and Smart

  • 1. A Better Architecture for Data: Adaptable, Scalable, and Smart Paul Boal & Adam Doyle June 8, 2018ST LOUIS
  • 2. Agenda 1. Modern Data Architecture Myths 2. Characteristics of Modern Data Architecture a. Governed, Secure b. Adaptable, Customer Centric, Collaborative c. Flexible, Elastic, Simple, Resilient d. Smart, Automated 3. Reference Data Architecture 4. How do I get there? 5. Recap 2
  • 4. MYTH #1 A modern data architecture is not a single technology or single vendor solution. Modern data architectures combine a portfolio of technologies to create an ecosystem with certain characteristics. Just install Hadoop 4
  • 5. MYTH #2 NoSQL technologies provide an efficient way to manage and access data under certain circumstances, but traditional relational databases and SQL continue to provide the most powerful way to organize and query well-known data. Modern must mean NoSQL 5
  • 6. MYTH #3 We talk a lot about the accelerating growth of data, the decreasing cost of storage and compute power, and the power of data science. It's convenient to believe that throwing all of this into a pot and simmering will produce results while we wait. The truth is that applying data, technology, and analytics still requires planning, analysis, and careful execution. Big data is magical pixie dust 6
  • 7. MYTH #4 Not all data is created equal. Sometimes you might have unreliable or invalid data that will obfuscate results if used inappropriately. Using extraneous data can make analysis more complicated by adding time to filter the data set and select features. Sometimes more just means more work. More data is always better 7
  • 8. MYTH #5 One of the characteristics of a modern data architecture is flexibility, meaning that your modernization should be developed incrementally, implementing new capabilities in a way that integrates with and slowly supplants existing limited technologies. I have to replace everything I have right now 8
  • 11. Governed, 11 The architecture and its components have to evolve and adapt in ways that are intentional and informed by enterprise strategy. Make collaboration the default. Communicate and then communicate some more. Treat every component as if another team may want to use it, too. Accessing information should be easy and should effortlessly ensure that users are knowingly using the right information for the right purpose. Security as an enabler of usage, not a denier of access. Track and log access for audit purposes and for learning. Secure
  • 12. ING Apache Atlas Open Metadata and Governance - APIs, notification systems, integration of metadata, security, and governance related tools 12 Governed, Secure https://www.slideshare.net/Hadoop_Summit/open-metadata-and-governance-with-apache-atlas?qid=6ea30d4f-15af-46ad-b580-349f78bb7752&v=&b=&from_search=9
  • 13. Frameworks and Tools Open Source Core Apache Atlas - Open Metadata Management Apache NiFi - Data Provenance Apache Sentry/Ranger - Fine-grained Access Control 13 Governed, Secure Vendor Participants
  • 14. Adaptable, Customer Centric, Collaborative It is not the strongest of the species that survives, nor the most intelligent. It is the one that is most adaptable to change. ~Charles Darwin 14
  • 15. Adaptable, 15 The more you deliver, the more you will learn about what is really needed, so be prepared to change and build solutions that can change easily. Agile data modeling. Agile analytics. Focus on delivering solutions that make sense to the people who will use them rather than following standards and rules above all else. The DBMS is not your user. Ralph Kimball and Edgar Codd are not your users. The Architecture Review Board is not your user. Customer Centric, Solutions that are interactively designed and built by a team with diverse capabilities and backgrounds can produce a result better than what any one individual would have done . Collaboration is more than requirements gathering. Collaboration is something that has to happen every day. Communicate, communicate, communicate. And then communicate. Collaborative
  • 16. Agile Data 16 Adaptable, Customer Centric, Collaborative http://agiledata.org/
  • 17. Tools and Techniques Model Storming Rapid experimentation Data science environments Wherescape, Snowflake, ThoughtSpot 17 Adaptable, Customer Centric, Collaborative
  • 18. Simple, Elastic, Resilient, Flexible Notice that the stiffest tree is most easily cracked while the bamboo or willow survives by bending with the wind. -Bruce Lee 18
  • 19. Simple, 19 Individual components should only be as complex as necessary. Reduce inter- dependencies. Use shared components. The system can easily had an increase in data volume, users, or complexity. Distributed computing. Cloud. DevOps. Errors in data or processing don't cause large parts of the system to fail. Isolate components. Tolerate, isolate, and report bad data. Change to the system is easy to accommodate and doesn't break other components. Microservices. Versioned interfaces. Backward compatibility. Elastic, Resilient, Flexible
  • 20. EarEcstasy 20 Data staging and Data Lake only contain needed data. Each data pipeline is only as complex as it needs to be to deliver on a narrow scope. Data is only integrated as needed, keeping processes simple. Simple, Elastic, Resilient, Flexible https://www.slideshare.net/AmazonWebServices/aws-summit-singapore-get-to-know-your-customers-modern-data-architecture-93784711
  • 21. Tools and Technologies 21 Cloud-based Infrastructure Cloud-native Services DevOps Containers Open Source Simple, Elastic, Resilient, Flexible
  • 22. Automated, Smart 22 I'm afraid I can't make that into a star schema, Dave. We are going through the process where software will automate software, automation will automate automation. -Mark Cuban
  • 23. Automated, 23 Automate tasks needed to optimize the function of the system, to detect significant changes, and to alert users when attention is needed. Metadata injection. Schema change detection. Anomaly detection. Alerting Schema detection. Self-tuning databases. Jeopardy champion. Data shaping, data quality recommendations. Natural Language Processing. Machine Learning. Recommender systems. Deep Learning. Smart
  • 24. EXAMPLE 83% reduction in workload matching complex, low quality data with contextual analysis 24 Automated, Smart
  • 25. TOOLS Integrated Machine Learning Integrated Search Intelligent Data Classification Natural Language Processing 25 Automated, Smart
  • 27. Modern Data Architecture 27 Everything should be made as simple as possible, but not simpler. - A. Einstein
  • 28.
  • 30. How do I get there from here? 30 Start with something you understand well from a business perspective. Select specific, valuable, measurable business cases. Add simple machine learning use cases. Identify use cases to move from a batch processing system to a streaming solution.
  • 32. The Myths are Just Myths 32 ● You don't "just need Hadoop" - You may not even need Hadoop at all! ● NoSQL has a place, but that isn't the entire solution either. ● There's no magical pixie dust here. This transformation will take real work. ● More data is not necessarily better - no matter how much we data hoarders want it to be. ● By definition, you have to incrementally create your modern data architecture, because it also has to continue to evolve.
  • 33. Governed, Secure 33 Maintain data and the data architecture in a way that makes governance and security a natural and easy part of doing work.
  • 34. Adaptable, Customer Centric, Collaborative 34 Apply data toward real challenges and opportunities that focus on customers and be willing and able to pivot as needed.
  • 35. Simple, Elastic, Resilient, Flexible 35 Build your data architecture, your teams, and your processes in a way that creates a high capacity for change.
  • 36. Automated, Smart 36 Create systems that can do more of the work of ingestion, storage, and integration without your intervention.

Notes de l'éditeur

  1. Intro and Myths - Paul Characteristics A, B - Paul Characteristics C, D - Adam Reference Architecture - Adam How do I Get There - Adam or Paul or Back-and-Forth Recap - Paul
  2. These characteristics describe the processes by which your data is maintained. Maybe here we want to tell stories about companies that didn’t secure their data (Target, Equifax, Schnucks)
  3. These characteristics describe the processes by which your data is maintained. Maybe here we want to tell stories about companies that didn’t secure their data (Target, Equifax, Schnucks)
  4. These characteristics describe the processes by which your data is maintained.
  5. These characteristics describe the processes by which your data is maintained.
  6. These characteristics describe the way in which you use your data. Built for purpose
  7. These characteristics describe the way in which you use your data. Built for purpose
  8. These characteristics describe the way in which you use your data.
  9. These characteristics describe the way in which you use your data.
  10. These characteristics describe the architecture and its capacity to change.
  11. These characteristics describe the architecture and its capacity to change.
  12. These characteristics describe the architecture and its capacity to change.
  13. These characteristics describe the way in which your data is integrated. Informatica ClAIre
  14. These characteristics describe the way in which your data is integrated. Informatica ClAIre
  15. These characteristics describe the way in which your data is integrated.
  16. These characteristics describe the way in which your data is integrated.
  17. These characteristics describe the architecture and its capacity to change.
  18. Processing data - Mastering, Integration, De-identification, Data Warehouse/Data Mart for reporting with rigor Provisioning - Pie in the Sky - I’d like some “Net Sales”