SlideShare une entreprise Scribd logo
1  sur  37
#datastack#datastack
Shaun
#datastack#datastack
What you’re going to learn
1 How top engineering organizations are building their
data infrastructure
The 7 core challenges of data integration
Why companies like Asana, Buffer, and SeatGeek
choose Redshift for their analytics warehouse
...and much more!
2
3
Shaun
#datastack
Data Infrastructure:
Then and Now
Dillon
#datastack
The traditional approach: ETL Dillon
END USERBI TEAMETL TEAM EDW TEAM
A
B
D
CZ
P
SUMMAR
Y
ELT - Heavy Transformation Restricted Q&AOLAP / Silos
SUMMAR
Y
F
E
#datastack
How companies are doing it today: ELT
Dillon
Modeling Layer
Transform at Query
FFF
Database
Extract Load
- name:
first_purchasers
type: single_value
base_view: orders
measures:
[orders.customer.all]
Analytics
Viz & Exploration
C
C
C
Transform (and
Explore!)
#datastack
Benefits of this approach
1.Redshift is performant enough to handle most
transformations
2.Users prefer performing transformations in a language
they already use (SQL) or with UI
3.Transformations are much simpler, more transparent
4.Performing transformations alongside raw data is great
for auditability
Dillon
#datastack
Data infrastructure has geek cred Shaun
#datastack
Data infrastructure has geek cred Shaun
#datastack
Data infrastructure has geek cred Shaun
#datastack
Data infrastructure has geek cred Shaun
#datastack#datastack
Data Integration
Data Warehouse
BI/Analytics
What the stack looks likeShaun
#datastack
Data Integration
Shaun
#datastack
Why consolidation matters
#datastack#datastack
internal analytics Shaun
#datastack
Quick poll Shaun
What top five data sources are a top priority for you to
integrate/keep integrated?
● production databases
● events
● error logs
● billing
● email marketing
● crm
● advertising
● erp
● a/b testing
● support
#datastack
“A year ago, we were facing a lot of stability problems with our data processing.
When there was a major shift in a graph, people immediately questioned the
data integrity. It was hard to distinguish interesting insights from
bugs. Data science is already an art so you need the infrastructure to give you
trustworthy answers to the questions you ask. 99% correctness is not
good enough. And on the data infrastructure team, we were spending a lot of
time churning on fighting urgent fires, and that prevented us from
making much long-term progress. It was painful.”
- Marco Gallotta, Asana, How to Build Stable, Accessible Data Infrastructure at a Startup
#datastack
“Our story would end here if real-time processing were perfect. But it’s not: some
events can come in days late, some time ranges need to be re-
processed after initial ingestion due to code changes or data revisions, various
components of the real-time pipeline can fail, and so on.”
- Gian Merlino, MetaMarkets, Building a Data Pipeline That Handles Billions of Events in Real-Time
#datastack
7 core challenges of data integration
Connections: Every API is a
unique and special snowflake
Accuracy: Ordering data on a
distributed system
Latency: Large object data stores
(Amazon S3, Redshift) are
optimized for batches not streams
Scale: Data will grow
exponentially as your company
grows
Flexibility: you’re interacting with
systems you don’t control
Monitoring: Notifications for
expired credentials, errors,
notifications of disruptions
Maintenance: Justifying
investment in ongoing
maintenance/improvement
Shaun
#datastack
Or...try Pipeline Shaun
Ad Platforms Customer SupportWeb Data
Marketing
Automation
CRM PaymentsEcommerce
#datastack
Warehousing Infrastructure
Shaun
#datastack
Analytics warehouse Shaun
Redshift is the most common
analytics warehouse.
Chosen by: Asana, Braintree, Looker, Seatgeek,
VigLink, Buffer
#datastack#datastack
awesome Shaun
#datastack#datastack
AirBnB experiment
Hive Redshift
Test 1: 3 billion rows of data 28 minutes <6 minutes
Test 2: two joins with millions of rows 182 seconds 8 seconds
Cost $1.29/hour/node $0.85/hour/node
Shaun
#datastack
Periscope research Shaun
#datastack
DiamondStream’s dashboard query
performance
Shaun
#datastack
Business Intelligence
& Analytics
Dillon
#datastack#datastack
A broken model Dillon
● Feedback loop is broken
● Disparate reporting
● Non-unified decision
making
● Versioning
● Reusability is lost
Marketing
Finance
AM
#datastack
Constraints of SQL Dillon
SQL is versatile, but shares the same flavor as
assembly-only languages such as Perl
Can write but not read
Promotes one-off, piecemeal analysis
Disparate interpretation
#datastack
The critical multiplier: modeling Dillon
Any SQL Data Warehouse
Modeling Layer
What’s our most
successful
marketing campaign
How does our Q4
Pipeline looks?
Who are our
healthiest / happiest
customers?
#datastack#datastack
analytics Dillon
● Data access
● Uniform definitions
● A Shared View
● Collaboration
● Analytical Speed
#datastack
What You Can Do
Dillon
#datastack#datastack
analytics tools Dillon
Week 1 Week 2-3
RJMetrics
Pipeline
BLOCKS
#datastack#datastack
marketing
#datastack#datastack
marketing
#datastack#datastack
analytics
#datastack#datastack
analytics
#datastack
Thank you!

Contenu connexe

Tendances

Data mining using sql server v1.2
Data mining using sql server v1.2Data mining using sql server v1.2
Data mining using sql server v1.2Koray Kocabas
 
Using Google Analytics for Better WordPress Website Data and Analysis
Using Google Analytics for Better WordPress Website Data and AnalysisUsing Google Analytics for Better WordPress Website Data and Analysis
Using Google Analytics for Better WordPress Website Data and AnalysisAndrew Stitt, MBA
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
 
Data Science Resources for Project and Product Managers
Data Science Resources for Project and Product ManagersData Science Resources for Project and Product Managers
Data Science Resources for Project and Product ManagersBrian Lynch
 
customTask: Your New Google Analytics BFF
customTask: Your New Google Analytics BFFcustomTask: Your New Google Analytics BFF
customTask: Your New Google Analytics BFFDana DiTomaso
 
Martijn Scheijbeler @ All Things DATA 2016
Martijn Scheijbeler @ All Things DATA 2016Martijn Scheijbeler @ All Things DATA 2016
Martijn Scheijbeler @ All Things DATA 2016Shuki Mann
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningSingleStore
 
Grow your Mobile App with Kamo
Grow your Mobile App with KamoGrow your Mobile App with Kamo
Grow your Mobile App with KamoTapstream
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit
 
5 Google Analytics Features You Should Be Using
5 Google Analytics Features You Should Be Using5 Google Analytics Features You Should Be Using
5 Google Analytics Features You Should Be UsingMatchCraft
 
Overcoming Technical SEO Challenges for Enterprise Sites - LearnInbound 2019
Overcoming Technical SEO Challenges for Enterprise Sites  - LearnInbound 2019Overcoming Technical SEO Challenges for Enterprise Sites  - LearnInbound 2019
Overcoming Technical SEO Challenges for Enterprise Sites - LearnInbound 2019Sam Marsden
 
How Incuda builds user journey models with Snowplow
How Incuda builds user journey models with SnowplowHow Incuda builds user journey models with Snowplow
How Incuda builds user journey models with SnowplowGiuseppe Gaviani
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlSpark Summit
 
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...Alluxio, Inc.
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike GualtieriSpark Summit
 
SMX London 2019 - Automating Reporting - Data Studio for Search Marketers
SMX London 2019 - Automating Reporting - Data Studio for Search MarketersSMX London 2019 - Automating Reporting - Data Studio for Search Marketers
SMX London 2019 - Automating Reporting - Data Studio for Search MarketersSam Marsden
 
UX Analytics for Data-driven Product Development
UX Analytics for Data-driven Product DevelopmentUX Analytics for Data-driven Product Development
UX Analytics for Data-driven Product DevelopmentTrieu Nguyen
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsSingleStore
 
Amazon Neptune - visually more options
Amazon Neptune - visually more optionsAmazon Neptune - visually more options
Amazon Neptune - visually more optionsLCloud
 
Tools and Hacks for B2b Sales People
Tools and Hacks for B2b Sales People Tools and Hacks for B2b Sales People
Tools and Hacks for B2b Sales People Sales Hacker
 

Tendances (20)

Data mining using sql server v1.2
Data mining using sql server v1.2Data mining using sql server v1.2
Data mining using sql server v1.2
 
Using Google Analytics for Better WordPress Website Data and Analysis
Using Google Analytics for Better WordPress Website Data and AnalysisUsing Google Analytics for Better WordPress Website Data and Analysis
Using Google Analytics for Better WordPress Website Data and Analysis
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Data Science Resources for Project and Product Managers
Data Science Resources for Project and Product ManagersData Science Resources for Project and Product Managers
Data Science Resources for Project and Product Managers
 
customTask: Your New Google Analytics BFF
customTask: Your New Google Analytics BFFcustomTask: Your New Google Analytics BFF
customTask: Your New Google Analytics BFF
 
Martijn Scheijbeler @ All Things DATA 2016
Martijn Scheijbeler @ All Things DATA 2016Martijn Scheijbeler @ All Things DATA 2016
Martijn Scheijbeler @ All Things DATA 2016
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
 
Grow your Mobile App with Kamo
Grow your Mobile App with KamoGrow your Mobile App with Kamo
Grow your Mobile App with Kamo
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu AdunuthulaSpark Summit Keynote by Seshu Adunuthula
Spark Summit Keynote by Seshu Adunuthula
 
5 Google Analytics Features You Should Be Using
5 Google Analytics Features You Should Be Using5 Google Analytics Features You Should Be Using
5 Google Analytics Features You Should Be Using
 
Overcoming Technical SEO Challenges for Enterprise Sites - LearnInbound 2019
Overcoming Technical SEO Challenges for Enterprise Sites  - LearnInbound 2019Overcoming Technical SEO Challenges for Enterprise Sites  - LearnInbound 2019
Overcoming Technical SEO Challenges for Enterprise Sites - LearnInbound 2019
 
How Incuda builds user journey models with Snowplow
How Incuda builds user journey models with SnowplowHow Incuda builds user journey models with Snowplow
How Incuda builds user journey models with Snowplow
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason PohlBuilding a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
 
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
Achieving Massive Concurrency & Sub-second Query Latency on Cloud Warehouses ...
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri 5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
 
SMX London 2019 - Automating Reporting - Data Studio for Search Marketers
SMX London 2019 - Automating Reporting - Data Studio for Search MarketersSMX London 2019 - Automating Reporting - Data Studio for Search Marketers
SMX London 2019 - Automating Reporting - Data Studio for Search Marketers
 
UX Analytics for Data-driven Product Development
UX Analytics for Data-driven Product DevelopmentUX Analytics for Data-driven Product Development
UX Analytics for Data-driven Product Development
 
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time AnalyticsFrom Spark to Ignition: Fueling Your Business on Real-Time Analytics
From Spark to Ignition: Fueling Your Business on Real-Time Analytics
 
Amazon Neptune - visually more options
Amazon Neptune - visually more optionsAmazon Neptune - visually more options
Amazon Neptune - visually more options
 
Tools and Hacks for B2b Sales People
Tools and Hacks for B2b Sales People Tools and Hacks for B2b Sales People
Tools and Hacks for B2b Sales People
 

En vedette

How to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 YearsHow to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 YearsJanessa Lantz
 
Driving a data-centric culture: a bottom-up opportunity
Driving a data-centric culture: a bottom-up opportunityDriving a data-centric culture: a bottom-up opportunity
Driving a data-centric culture: a bottom-up opportunityThe Economist Media Businesses
 
2012 Online User Behavior and Engagement Study - Harris Interactive
2012 Online User Behavior and Engagement Study - Harris Interactive2012 Online User Behavior and Engagement Study - Harris Interactive
2012 Online User Behavior and Engagement Study - Harris InteractiveHemant Charya
 
Compendium constructieve veiligheid_2011[1]
Compendium constructieve veiligheid_2011[1]Compendium constructieve veiligheid_2011[1]
Compendium constructieve veiligheid_2011[1]jbvrieling
 
present_svarka1
present_svarka1present_svarka1
present_svarka1marinella
 
Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)
Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)
Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)Trung Tâm Kiến Tập
 
Understanding User Behavior Online
Understanding User Behavior OnlineUnderstanding User Behavior Online
Understanding User Behavior OnlineKaren McGrane
 
Operational elastic
Operational elasticOperational elastic
Operational elasticEd Anderson
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data InfrastructureTrivadis
 
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Alfredo Krieg
 
Hiring Hacks: Under Armour’s Formula for Data-Centric & Personalized Recruiting
Hiring Hacks: Under Armour’s Formula for Data-Centric & Personalized RecruitingHiring Hacks: Under Armour’s Formula for Data-Centric & Personalized Recruiting
Hiring Hacks: Under Armour’s Formula for Data-Centric & Personalized RecruitingGreenhouseSoftware
 
Using Elastic to Monitor Anything
Using Elastic to Monitor Anything Using Elastic to Monitor Anything
Using Elastic to Monitor Anything Idan Tohami
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...DevOpsDays Tel Aviv
 
How TERN Data Infrastructure works
How TERN Data Infrastructure worksHow TERN Data Infrastructure works
How TERN Data Infrastructure worksTERN Australia
 

En vedette (17)

How to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 YearsHow to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 Years
 
Driving a data-centric culture: a bottom-up opportunity
Driving a data-centric culture: a bottom-up opportunityDriving a data-centric culture: a bottom-up opportunity
Driving a data-centric culture: a bottom-up opportunity
 
2012 Online User Behavior and Engagement Study - Harris Interactive
2012 Online User Behavior and Engagement Study - Harris Interactive2012 Online User Behavior and Engagement Study - Harris Interactive
2012 Online User Behavior and Engagement Study - Harris Interactive
 
Compendium constructieve veiligheid_2011[1]
Compendium constructieve veiligheid_2011[1]Compendium constructieve veiligheid_2011[1]
Compendium constructieve veiligheid_2011[1]
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
WEBSITES B2B XUẤT NHẬP KHẨU
WEBSITES B2B XUẤT NHẬP KHẨUWEBSITES B2B XUẤT NHẬP KHẨU
WEBSITES B2B XUẤT NHẬP KHẨU
 
present_svarka1
present_svarka1present_svarka1
present_svarka1
 
Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)
Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)
Dự thảo cân container vận tải biển quốc tế theo SOLAS (Bộ GTVT)
 
Understanding User Behavior Online
Understanding User Behavior OnlineUnderstanding User Behavior Online
Understanding User Behavior Online
 
Operational elastic
Operational elasticOperational elastic
Operational elastic
 
Big Data Infrastructure
Big Data InfrastructureBig Data Infrastructure
Big Data Infrastructure
 
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
 
Hiring Hacks: Under Armour’s Formula for Data-Centric & Personalized Recruiting
Hiring Hacks: Under Armour’s Formula for Data-Centric & Personalized RecruitingHiring Hacks: Under Armour’s Formula for Data-Centric & Personalized Recruiting
Hiring Hacks: Under Armour’s Formula for Data-Centric & Personalized Recruiting
 
Using Elastic to Monitor Anything
Using Elastic to Monitor Anything Using Elastic to Monitor Anything
Using Elastic to Monitor Anything
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
Using Elastic to Monitor Everything - Christoph Wurm, Elastic - DevOpsDays Te...
 
How TERN Data Infrastructure works
How TERN Data Infrastructure worksHow TERN Data Infrastructure works
How TERN Data Infrastructure works
 

Similaire à How to Build a Data-Driven Company: From Infrastructure to Insights

FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?Denodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스Amazon Web Services Korea
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceSense Corp
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseCaserta
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Informatica Interview Questions & Answers
Informatica Interview Questions & AnswersInformatica Interview Questions & Answers
Informatica Interview Questions & AnswersZaranTech LLC
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

Similaire à How to Build a Data-Driven Company: From Infrastructure to Insights (20)

FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
클라우드에서의 데이터 웨어하우징 & 비즈니스 인텔리전스
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
Gowthami_Resume
Gowthami_ResumeGowthami_Resume
Gowthami_Resume
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Informatica Interview Questions & Answers
Informatica Interview Questions & AnswersInformatica Interview Questions & Answers
Informatica Interview Questions & Answers
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Plus de Janessa Lantz

From Question to Action
From Question to ActionFrom Question to Action
From Question to ActionJanessa Lantz
 
Optimizing Customer Support
Optimizing Customer SupportOptimizing Customer Support
Optimizing Customer SupportJanessa Lantz
 
Analyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataAnalyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataJanessa Lantz
 
How to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataHow to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataJanessa Lantz
 
How to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionHow to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionJanessa Lantz
 
Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16Janessa Lantz
 
The Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS AnalyticsThe Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS AnalyticsJanessa Lantz
 
Using Benchmark Data to Improve Performance
Using Benchmark Data to Improve PerformanceUsing Benchmark Data to Improve Performance
Using Benchmark Data to Improve PerformanceJanessa Lantz
 
Jumpstart Your Momentum
Jumpstart Your MomentumJumpstart Your Momentum
Jumpstart Your MomentumJanessa Lantz
 
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Janessa Lantz
 
Leveraging Google Adwords Bid Multipliers
Leveraging Google Adwords Bid MultipliersLeveraging Google Adwords Bid Multipliers
Leveraging Google Adwords Bid MultipliersJanessa Lantz
 
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...Janessa Lantz
 
The Human Algorithm: Automating Startup Data Collection at Mattermark
The Human Algorithm: Automating Startup Data Collection at MattermarkThe Human Algorithm: Automating Startup Data Collection at Mattermark
The Human Algorithm: Automating Startup Data Collection at MattermarkJanessa Lantz
 
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...Janessa Lantz
 
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsHow to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsJanessa Lantz
 
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueThe Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueJanessa Lantz
 
Two Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive AudienceTwo Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive AudienceJanessa Lantz
 
Evaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's PerspectiveEvaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's PerspectiveJanessa Lantz
 
How to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More ConversionsHow to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More ConversionsJanessa Lantz
 

Plus de Janessa Lantz (20)

From Question to Action
From Question to ActionFrom Question to Action
From Question to Action
 
Optimizing Customer Support
Optimizing Customer SupportOptimizing Customer Support
Optimizing Customer Support
 
Analyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataAnalyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords Data
 
How to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataHow to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your Data
 
How to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionHow to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer Retention
 
Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16
 
The Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS AnalyticsThe Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS Analytics
 
Using Benchmark Data to Improve Performance
Using Benchmark Data to Improve PerformanceUsing Benchmark Data to Improve Performance
Using Benchmark Data to Improve Performance
 
Jumpstart Your Momentum
Jumpstart Your MomentumJumpstart Your Momentum
Jumpstart Your Momentum
 
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
 
Thinking in Data
Thinking in DataThinking in Data
Thinking in Data
 
Leveraging Google Adwords Bid Multipliers
Leveraging Google Adwords Bid MultipliersLeveraging Google Adwords Bid Multipliers
Leveraging Google Adwords Bid Multipliers
 
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...
 
The Human Algorithm: Automating Startup Data Collection at Mattermark
The Human Algorithm: Automating Startup Data Collection at MattermarkThe Human Algorithm: Automating Startup Data Collection at Mattermark
The Human Algorithm: Automating Startup Data Collection at Mattermark
 
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...
 
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsHow to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
 
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueThe Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
 
Two Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive AudienceTwo Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive Audience
 
Evaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's PerspectiveEvaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's Perspective
 
How to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More ConversionsHow to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More Conversions
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

How to Build a Data-Driven Company: From Infrastructure to Insights

Notes de l'éditeur

  1. Good afternoon, everyone! Thanks so much for joining us today. I’m going to introduce you to my co-host in just a second, but first, let me run through just a few housekeeping details.
  2. We have a lot on the agenda for today. The core of our presentation is going to focus on how companies like yours are solving their data infrastructure challenges. We’re going to cover the challenges engineers should expect around data integration, why Amazon Redshift is quickly becoming the data warehouse of choice, cultural barriers to building a data-driven company, and a lot more.
  3. First thing we’re going to cover is data infrastructure, or the actual architecture of legacy and modern data pipelines
  4. For the last 30 years or so, really since the inception of modern databases, data warehousing has been the standard model to aggregate data and provide business-directed analytics Data is extracted from various sources…. databases, third-party applications, flat files, etc…. and transformed into a predefined model, then loaded into the data warehouse This ETL process results in data cubes and data silos, where analytics are separated by key groupings for various departments, such as marketing, product, sales, etc. This results in a few issues that are fundamentally prohibitive to creating a data-driven organization First, it’s very resource intensive (and expensive) to manage all of the transformations and data loading Second, it results in latency in the analytics process. End users only have access to pre-defined metrics, which are typically too broad or inflexible to guide nimble decision making. This means that end-users aren’t really getting any actionable insights from these metrics - they’re just looking at high level analysis Third, it restricts drilling. If an end-user finds an interesting piece of information…. say sales accelerated drastically for a certain user age group, and you want to know why… that end-user needs to rmake another data request from the ETL or IT team, who will then take some time to return the request. This latency constrains end users from making data-driven decisions. These were commonly recognized problems. So nowadays, as Shaun was mentioning, modern tech companies have reworked this process
  5. Nowadays, companies are collecting more data than ever before Additionally, database technology has witnessed significant advances in the last several years... Databases themselves are now capable of performing sophisticated analysis very quickly This removes the need for data silos and data cubes - all analytics can be performed directly on the central database What this means, is that it now makes sense to shift the burden of complex transformations to the front of the pipeline - to the BI tool - where transformations can be performed on-the-fly, at query time
  6. Several benefits to this approach, some of which I mentioned a minute ago but are worth repeating: First, you no longer require huge, resource-intensive engineering or ETL team to move all of your data - so it’s much cheaper on the resource side Secondly, Technical users can pull data in a language they’re used to, SQL…. and if you have a modeling layer, like Looker provides, then users can actually query the data directly from the UI, without any technical knowledge. Transformations aren’t being done by engineers on the backend, they’re being performed as the user pulls the data, so they’re much easier to repeat and easier to understand Lastly, this allows you to audit transformations, so you users understand the components behind analysis - they’ll understand how a metric is defined And Shaun has a few examples of this in practice
  7. In the process of data engineering going from being a clumsy, multi-year project -- it’s gained some geek cred. Over the past year we’ve watched as one company after the next shared their “how we built our data infrastructure” blog posts. Yes, even looker. At some point data infrastructure gained geek cred. We were really interested in the details behind all these projects so we did a “meta-analysis” where we looked at how these companies solved core data engineering challenges.
  8. We looked at Zulilly
  9. Spotify
  10. Seatgeek, Buffer, Asana, and many more.
  11. Some of these companies (like Netflix and Spotify) are building data products -- recommendation engines. That stack can look slightly different. For this event, we’re going to focus on companies who are building data infrastructure for analytics. And for these companies what we saw is that the process looks very much like what Dillon was just describing. First, they extract data from the variety of sources. Then they load it into the data warehouse. Then they do transformations on top of that.
  12. Let’s start at the first part of the conversation. Extract & Load, or more simply, data integration.
  13. And just to clarify, the reason this step is so important is because all future insights depend on it. Here are some of the use cases that the Asana team laid out. “It’s difficult work – but an absolute requirement of great intelligence.”
  14. Here are the most common data sources that we saw companies connecting to. Our analysis of how companies built their data infrastructure was based largely on blog posts (and some conversations) on the topic. One limitation there is that engineers tend to write these pieces fairly soon after completion of the project and there’s often the understanding that more data sources will be added on later. Asana built data connections to the most sources, but there’s an enormous amount of data that can be derived just from connecting ad spend to purchase history living in your production databases.
  15. Now, for some audience participation, could you grab your mouse and fill in this poll? What top five data sources are a top priority for you to integrate nad keep integrated? While you’re filling in your answers, let me just say that data consolidation comes with it’s own special challenges. When Asana first started building their data infrastructure they did it using Python scripts and MySQL. And if you’re just starting out this can work for you too, but you will outgrow it eventually. And I’m going to say more on that in a second, but first let’s take a look at the results.
  16. So in the Asana teams own words, here are some of the challenges they faced during consolidation -- doubts about data integrity due to a lack of monitoring and logging, insights vs. bugs. Urgent fires when systems went down.
  17. And this is from MetaMarkets. Braintree’s team said: deletes are nearly impossible to keep track of, you have to keep track of data that changed, batch updates are slow and it’s difficult to know how long they’ll take.
  18. A big part of my job involves talking to people every day about their data infrastructure. These posts touch on some of the problems you can expect, but keep in mind -- these people are the successful ones. I’ve been on calls with many a frustrated engineer throwing in the towel on their data infrastructure projects after 1 year at the task. Data consolidation is hard. Here are 7 of the core challenges.
  19. Early last month we released a SaaS product designed to solve this problem -- called Pipeline. It takes data from any number of integrations and that data flows into a datawarehouse with super low latency. We’re aggressively releasing new integrations each month, so if you need an integration you don’t see here today, let us know! If you want to learn more about this, stick around at the end for a demo.
  20. The next step in the process is data warehousing. Hands down the top pick for warehousing was Redshift.
  21. Among the companies that we looked at, Redshift was the most popular choice for an analytics warehouse.
  22. The most common reason? speed. People are seeing dramatic improvements in query time using Redshift. Asana said that queries that were taking hours now take a few seconds. Similarly, seatgeek had a critical query that took 20 minutes, now takes half a minute in redshift.
  23. Here are the results of AirBnB tests that show performance in both query time and cost. Source: http://nerds.airbnb.com/redshift-performance-cost/
  24. Here’s some research from Periscope showing Redshift vs. Postgres shows similar performance gains.
  25. And here is research from DiamondStream showing how much better their internal dashboards performed when built on Redshift vs. MS SQL. I think it’s this final reason why Looker is such a big fan of Redshift and recommends it to their clients. source: http://www.datasciencecentral.com/profiles/blogs/why-5-companies-chose-amazon-redshift
  26. Right, thanks Shaun... So earlier I talked a bit about the structural differences between old data architecture vs modern data architecture - now I’m going to elaborate a bit on how that architecture impacts business intelligence and analytics work flows
  27. This slide shows workflows with the legacy architecture I described earlier As a reminder, with legacy architecture, each department is working in silos, all serviced by a central IT or Analyst team This is fundamentally prohibitive to a data-drive culture for a few reasons: First, it’s extremely resource-intensive for the central data team to service the needs of their business users. Second, it creates a bottleneck in the analytics process. You’ll see that the arrows are flowing away from the central data team, and that’s for a specific reason. The data team will provide pre-determined metrics for various departments, then rerun and distribute those metrics periodically. These metrics are typically overly broad and not actionable. If a user has further questions about the analysis…. and that is often the case. How do you know what questions to ask about the data, unless you’ve seen the data already?... Iif a user has a further question, they need to submit a request from the data team, who will may take a few days to turn it around. This latency restricts end-users from making quick, informed business decisions based on their data. Plus, in most companies, there is typically a hierarchy to who receives data. The Executive team can get all the data they want, while requests from sales reps, marketing managers, etc. are pushed to the back of the line. These groups rarely have the ability to make strategic decisions based on the analysis they request Lastly, this model results in disparate reporting. If 5 different departments request the same metric from 5 different database analysts, it’s highly likely that those analysts will have differing ideas about the appropriate way to calculate a metric. Especially when you get into the more sophisticated stuff - things like Affinity Analysis... if I buy X what is the likelihood I buy Y?.... There are a few statistically defensible ways to calculate that metrics. In practice, it’s very common for large organizations to have non-unified definitions, which leads to headaches, data chaos, and an inability to make decisions based on data
  28. One of the factors that contributes to these workflow issues, which is sort of the last point I touched on, is the difficulty in consistently defining metrics across a company Part of this is because of the nature of SQL, the de facto language for querying databases SQL can be easy to write, but difficult to read / audit If you give 10 analysts the same metrics, you’ll very likely get 10 different queries, some of which may yield the same results, some of which may not In practice, this often results in data analysts recycling and slightly modifying old queries, without ever really understanding the inner workings of the query This then jeopardizes the integrity of the data, which makes it difficult to consistently interpret results
  29. How do we solve this issue of one-off queries and silo’d reporting? We create a data model as an intermediary All definitions of metrics, and data transformations, are defined in one place, where all users can access and understand them Now, you don’t need those 10 analysts, you only need 1-2 who monitor the modeling layer, and you can be confident all users are working off of the same definitions and interpretations of the results You can also link together data from different sources, so you can link Salesforce marketo zendesk data together to get a comprehensive view of your customer This allows us to maintain “data governance”, which is a term that you probably hear a lot lately So, how does this modeling layer impact workflows?
  30. This slide depicts BI and analysis workflows with modern architecture creates a, but creates a truly data-driven environment All users have equal access to the data through a UI, they don’t need to know SQL. So now Sales, marketing, finance, customer success, teams that previously could not directly access data, have the ability to explore their database in full detail Since everyone is looking at the same numbers and reports, business users can collaborate and facilitate meaningful conversations, based on shared insights Business users can make informed strategic decisions on the fly, which results in tangible, significant competitive advantages So, how do you set up this kind of architecture? I think a good example of this is one of our customers, Infectious Media, who offer digital advertising for a myriad of Fortune companies. With Looker, their Sales optimization team has the ability to see, in real time, how various advertising campaigns are performing across every website and publisher. If a certain type of website is driving the most clicks or conversions, the optimization team can immediately determine why, then redirect future campaign efforts towards those specific websites or publishers, and perhaps new, similar ones. In a world where advertisements sometimes only last a week or two, the ability to constantly iterate on, and refine campaign strategy, results in tangible differences in top line sales. This represents the most significant competitive advantage a company in this space can possess… This model is required for a company to survive.
  31. Now that we understand the benefits, I’ll explain how the set-up of these modern infrastructures is easier than ever. And I’ll illustrate this with an example using RJ Pipeline
  32. Say you’re a company that collects data from a number of various sources, such as 3rd party applications Rather than needing to perform complex transformations (like with legacy architecture), you can dump all of your data directly into a centralized location using a middleware tool such as RJ Pipeline. This completely centralizes all of your data, and prepares it for analytics, with a few clicks. No need for heavy engineering resources and workloads Once the data is centralized, you can quickly add a tool with modeling layers to help distribute data to all of your end users (again, the modeling layer is key here) Working with a tool like Looker, for example, we have an offering called Looker Blocks, which is essentially pre-templated code for your modeling layer for all sorts of third-party applications and types of analysis…. These Blocks can be copied into your data model, so now even most of the actual data model development is initially taken care of for you The result, is going from having silo’d data in several disparate applications with unequal access for users…. to having data centralized in a modern database, with a full analytics suite on top, that can be accessed by any user What would have taken… quite literally…. months of intensive engineering efforts, is now accomplished in 1,2, or 3 weeks… Which is pretty astounding. That time-to-value from your data is something we’ve never really seen before in the data space.