Best Practices for Unleashing the Power of Data Lakes

#TalendConnect
#TalendConnect
Best practices for unleashing the
power of data lakes
Isabelle Nuage & Christophe Toum, Big Data Products, Talend

#TalendConnect
Self-service data lake,
cafeteria style
Using sensor data collected in real-time to
improve gas turbines reliability, operational
performance and extend lifetime value.

#TalendConnect
Why Do We Need a Data Lake?
“Data lakes are enterprise-wide data management platforms for analyzing disparate
sources of data in its native format.”, Gartner.
BusinessValue
Reducing cost
Generating new opportunities
• ETL offload
• EDW offload/optimization
• Data archiving
• Customer acquisition, retention..
• Real-time engagement
• Pricing optimization
• Demand forecasting
• Risk and fraud
• Predictive maintenance
• Smart products…

#TalendConnect
But Data Lakes Bring New Challenges
The rest
of us
Data Lakes Bring New Challenges
High-end
users
Complexity, poor governance and control, no reuse

#TalendConnect
Data Lake – Conceptual Architecture
Acquire
Ingest
Understand
& Improve
Curate &
Govern
Deliver
Self-service
SCALE

#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
Continuously refreshed data Continuous data delivery and data processes

#TalendConnect
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Wide connectivity
 Batch & streaming ubiquity
 Scale with volume and variety
Pitfalls:
o Hand coding
o Fragmented tools

#TalendConnect
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Add context on data (provenance,
semantics…)
 Optimize data with curation,
stewardship, preparation…
 Use a collaborative process
Pitfalls:
o Authoritative governance
o Inconsistent framework

#TalendConnect
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Pervasive DQ, masking…
 Consistent operationalization
 Single platform for all use cases
& personas
Pitfalls:
o Fragmented tools
o Hand coding
o Shadow IT

#TalendConnect
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Make data accessible
 Governed self-service
 Scalable operationalization
Pitfalls:
o Unmanaged autonomy
o Self-service tools for the tech
savvy

#TalendConnect
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
GET READY FOR CHANGE

#TalendConnect
Ingestion Best Practices
Transactions
Messages & Events
10110
11100
10
10110
11100
10
Logs
Sensors
Data Analytics & Data Science
Real-time Data Visualization
Real-time Indicators / Scorecard
Collect - Distribute
Track
Streaming
Windowing
Alert
NYC Taxi Data Streaming

#TalendConnect#TalendConnect
NYC Taxi Data Streaming

#TalendConnect
• The future features described in this presentation are under consideration by
Talend and are not commitments for future products, technologies, or services.
• The roadmap is subject to change and Talend does not guarantee the features
or release dates.
Disclaimer

#TalendConnect
Roadmap 2017
Addressing the needs of large enterprises
Big Data
1st on Spark 2.0
&
Data Prep on Big
Data
Data Prep
&
Data Ingestion
Cloud Self-service
Data Stewardship
&
Self-service
connectors
Governance
Apache Atlas

#TalendConnect
Analyze way more data to find more opportunities for innovations
and transformations
Real-time data streaming brings increased agility
To unleash data lakes, data governance is essential
Key Take Aways

#TalendConnect
Free Trial: Talend Big Data Sandbox
• A ready-to-run Docker environment
• A step-by-step expert guide
• Real-world scenarios using Spark, Kafka,
MapReduce & NoSQL
www.talend.com/BigDataSandbox
Hit the Easy Button for Hadoop, Spark and Machine Learning
#TalendConnect

#TalendConnect
#TalendConnect
Thank You

Best Practices for Unleashing the Power of Data Lakes

Recommandé

Recommandé

Contenu connexe

Plus de Talend

Plus de Talend (10)

Dernier

Dernier (20)

Best Practices for Unleashing the Power of Data Lakes

Notes de l'éditeur