SlideShare une entreprise Scribd logo
1  sur  18
#TalendConnect
#TalendConnect
Best practices for unleashing the
power of data lakes
Isabelle Nuage & Christophe Toum, Big Data Products, Talend
#TalendConnect
Self-service data lake,
cafeteria style
Using sensor data collected in real-time to
improve gas turbines reliability, operational
performance and extend lifetime value.
#TalendConnect
Why Do We Need a Data Lake?
“Data lakes are enterprise-wide data management platforms for analyzing disparate
sources of data in its native format.”, Gartner.
BusinessValue
Reducing cost
Generating new opportunities
• ETL offload
• EDW offload/optimization
• Data archiving
• Customer acquisition, retention..
• Real-time engagement
• Pricing optimization
• Demand forecasting
• Risk and fraud
• Predictive maintenance
• Smart products…
#TalendConnect
But Data Lakes Bring New Challenges
The rest
of us
Data Lakes Bring New Challenges
High-end
users
Complexity, poor governance and control, no reuse
#TalendConnect
Data Lake – Conceptual Architecture
Acquire
Ingest
Understand
& Improve
Curate &
Govern
Deliver
Self-service
SCALE
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
Continuously refreshed data Continuous data delivery and data processes
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Wide connectivity
 Batch & streaming ubiquity
 Scale with volume and variety
Pitfalls:
o Hand coding
o Fragmented tools
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Add context on data (provenance,
semantics…)
 Optimize data with curation,
stewardship, preparation…
 Use a collaborative process
Pitfalls:
o Authoritative governance
o Inconsistent framework
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Pervasive DQ, masking…
 Consistent operationalization
 Single platform for all use cases
& personas
Pitfalls:
o Fragmented tools
o Hand coding
o Shadow IT
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
 Make data accessible
 Governed self-service
 Scalable operationalization
Pitfalls:
o Unmanaged autonomy
o Self-service tools for the tech
savvy
#TalendConnect
Best Practices to a Successful Data Lake
Accelerate
Data
Ingestion
Understand
& Govern
Your Data
Remove Silos
Unify Data
Management
Deliver Data
to a Wide
Audience
GET READY FOR CHANGE
#TalendConnect
Ingestion Best Practices
Transactions
Messages & Events
10110
11100
10
10110
11100
10
Logs
Sensors
Data Analytics & Data Science
Real-time Data Visualization
Real-time Indicators / Scorecard
Collect - Distribute
Track
Streaming
Windowing
Alert
NYC Taxi Data Streaming
#TalendConnect#TalendConnect
NYC Taxi Data Streaming
#TalendConnect
• The future features described in this presentation are under consideration by
Talend and are not commitments for future products, technologies, or services.
• The roadmap is subject to change and Talend does not guarantee the features
or release dates.
Disclaimer
#TalendConnect
Roadmap 2017
Addressing the needs of large enterprises
Big Data
1st on Spark 2.0
&
Data Prep on Big
Data
Data Prep
&
Data Ingestion
Cloud Self-service
Data Stewardship
&
Self-service
connectors
Governance
Apache Atlas
#TalendConnect
Analyze way more data to find more opportunities for innovations
and transformations
Real-time data streaming brings increased agility
To unleash data lakes, data governance is essential
Key Take Aways
#TalendConnect
Free Trial: Talend Big Data Sandbox
• A ready-to-run Docker environment
• A step-by-step expert guide
• Real-world scenarios using Spark, Kafka,
MapReduce & NoSQL
www.talend.com/BigDataSandbox
Hit the Easy Button for Hadoop, Spark and Machine Learning
#TalendConnect
#TalendConnect
#TalendConnect
Thank You

Contenu connexe

Plus de Talend

Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseTalend
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningTalend
 
VirtusaPolaris Corporate Fact Sheet
VirtusaPolaris Corporate Fact SheetVirtusaPolaris Corporate Fact Sheet
VirtusaPolaris Corporate Fact SheetTalend
 
VirtusaPolaris’ Enterprise Information Management
VirtusaPolaris’ Enterprise Information ManagementVirtusaPolaris’ Enterprise Information Management
VirtusaPolaris’ Enterprise Information ManagementTalend
 
5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend ConnectTalend
 
#TeamTalend Spotlight - Cyril Amsellem
#TeamTalend Spotlight - Cyril Amsellem#TeamTalend Spotlight - Cyril Amsellem
#TeamTalend Spotlight - Cyril AmsellemTalend
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudTalend
 
Who is Talend?
Who is Talend?Who is Talend?
Who is Talend?Talend
 

Plus de Talend (10)

Talend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech OverviewTalend Summer '17 Release: New Features and Tech Overview
Talend Summer '17 Release: New Features and Tech Overview
 
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven EnterpriseMike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
Mike Tuche, CEO of Talend: Enabling the Data Driven Enterprise
 
Unleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine LearningUnleash the Power of Big Data and Machine Learning
Unleash the Power of Big Data and Machine Learning
 
VirtusaPolaris Corporate Fact Sheet
VirtusaPolaris Corporate Fact SheetVirtusaPolaris Corporate Fact Sheet
VirtusaPolaris Corporate Fact Sheet
 
VirtusaPolaris’ Enterprise Information Management
VirtusaPolaris’ Enterprise Information ManagementVirtusaPolaris’ Enterprise Information Management
VirtusaPolaris’ Enterprise Information Management
 
5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect5 Simple Steps to Unleash Big Data Talend Connect
5 Simple Steps to Unleash Big Data Talend Connect
 
#TeamTalend Spotlight - Cyril Amsellem
#TeamTalend Spotlight - Cyril Amsellem#TeamTalend Spotlight - Cyril Amsellem
#TeamTalend Spotlight - Cyril Amsellem
 
Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?Talend 6.1 - What's New in Talend?
Talend 6.1 - What's New in Talend?
 
An Introduction to Talend Integration Cloud
An Introduction to Talend Integration CloudAn Introduction to Talend Integration Cloud
An Introduction to Talend Integration Cloud
 
Who is Talend?
Who is Talend?Who is Talend?
Who is Talend?
 

Dernier

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Best Practices for Unleashing the Power of Data Lakes

  • 1. #TalendConnect #TalendConnect Best practices for unleashing the power of data lakes Isabelle Nuage & Christophe Toum, Big Data Products, Talend
  • 2. #TalendConnect Self-service data lake, cafeteria style Using sensor data collected in real-time to improve gas turbines reliability, operational performance and extend lifetime value.
  • 3. #TalendConnect Why Do We Need a Data Lake? “Data lakes are enterprise-wide data management platforms for analyzing disparate sources of data in its native format.”, Gartner. BusinessValue Reducing cost Generating new opportunities • ETL offload • EDW offload/optimization • Data archiving • Customer acquisition, retention.. • Real-time engagement • Pricing optimization • Demand forecasting • Risk and fraud • Predictive maintenance • Smart products…
  • 4. #TalendConnect But Data Lakes Bring New Challenges The rest of us Data Lakes Bring New Challenges High-end users Complexity, poor governance and control, no reuse
  • 5. #TalendConnect Data Lake – Conceptual Architecture Acquire Ingest Understand & Improve Curate & Govern Deliver Self-service SCALE
  • 6. #TalendConnect Best Practices to a Successful Data Lake Accelerate Data Ingestion Understand & Govern Your Data Remove Silos Unify Data Management Deliver Data to a Wide Audience Continuously refreshed data Continuous data delivery and data processes
  • 7. #TalendConnect Best Practices to a Successful Data Lake Accelerate Data Ingestion Understand & Govern Your Data Remove Silos Unify Data Management Deliver Data to a Wide Audience  Wide connectivity  Batch & streaming ubiquity  Scale with volume and variety Pitfalls: o Hand coding o Fragmented tools
  • 8. #TalendConnect Best Practices to a Successful Data Lake Accelerate Data Ingestion Understand & Govern Your Data Remove Silos Unify Data Management Deliver Data to a Wide Audience  Add context on data (provenance, semantics…)  Optimize data with curation, stewardship, preparation…  Use a collaborative process Pitfalls: o Authoritative governance o Inconsistent framework
  • 9. #TalendConnect Best Practices to a Successful Data Lake Accelerate Data Ingestion Understand & Govern Your Data Remove Silos Unify Data Management Deliver Data to a Wide Audience  Pervasive DQ, masking…  Consistent operationalization  Single platform for all use cases & personas Pitfalls: o Fragmented tools o Hand coding o Shadow IT
  • 10. #TalendConnect Best Practices to a Successful Data Lake Accelerate Data Ingestion Understand & Govern Your Data Remove Silos Unify Data Management Deliver Data to a Wide Audience  Make data accessible  Governed self-service  Scalable operationalization Pitfalls: o Unmanaged autonomy o Self-service tools for the tech savvy
  • 11. #TalendConnect Best Practices to a Successful Data Lake Accelerate Data Ingestion Understand & Govern Your Data Remove Silos Unify Data Management Deliver Data to a Wide Audience GET READY FOR CHANGE
  • 12. #TalendConnect Ingestion Best Practices Transactions Messages & Events 10110 11100 10 10110 11100 10 Logs Sensors Data Analytics & Data Science Real-time Data Visualization Real-time Indicators / Scorecard Collect - Distribute Track Streaming Windowing Alert NYC Taxi Data Streaming
  • 14. #TalendConnect • The future features described in this presentation are under consideration by Talend and are not commitments for future products, technologies, or services. • The roadmap is subject to change and Talend does not guarantee the features or release dates. Disclaimer
  • 15. #TalendConnect Roadmap 2017 Addressing the needs of large enterprises Big Data 1st on Spark 2.0 & Data Prep on Big Data Data Prep & Data Ingestion Cloud Self-service Data Stewardship & Self-service connectors Governance Apache Atlas
  • 16. #TalendConnect Analyze way more data to find more opportunities for innovations and transformations Real-time data streaming brings increased agility To unleash data lakes, data governance is essential Key Take Aways
  • 17. #TalendConnect Free Trial: Talend Big Data Sandbox • A ready-to-run Docker environment • A step-by-step expert guide • Real-world scenarios using Spark, Kafka, MapReduce & NoSQL www.talend.com/BigDataSandbox Hit the Easy Button for Hadoop, Spark and Machine Learning #TalendConnect

Notes de l'éditeur

  1. Event theme: Unlock your data for unlimited possibilities Oftentimes, an enterprise data lake is viewed as a panacea for all data ills, including being viewed as the ‘holy grail’ for those trying to spur digital transformation. Yet many IT teams are still struggling to see the payoffs from such data lake investments.  In this session we will present best practices and reference architecture for building a data lake with Talend Big Data. We will talk about some new big data streaming capabilities coming in Talend Winter.   As a reference JM’s session to avoid too much overlap. Data Prep: How-to enable sustainable IT & Business collaboration around self-service data Data is everywhere and everyone needs it. Only a modern data platform, that combines self-service data access, with Big Data and the Data Lake, can turn data into a "liquid asset"  that anyone can consume and use. But major roadblocks exist, particularly when it comes to data governance and security. To tackle these challenges, traditional authoritative governance approaches are being morphed into collaborative practices, that turn existing business users, into "data workers". Come to this session to learn how Talend Data Fabric addresses these issues and also get a sneak peak at our new self-service data capabilities coming in Talend Winter.
  2. Unleashing the data lake to 22K people around 80+ countries to perform machine and equipment health, reliability management, and maintenance optimization. Convey that DW is diff from DL Refer to: http://www.ge.com/digital/industries/power-utility/power-generation Real-time: revenue generation Prolong lifetime value of gas turbines Keep it running for the next 10y, depreciate in a better way Sell the NRJ it creates Batch meets RT world Changing the GE culture Before the data lake, they were only able to analyze 2% of the gas turbines data GE is Massive company, 7 of their departments use Talend GE Power is one of them The Challenge GE Power needed to operationalize and optimize their business, operations and asset performance management (machine & equipment health, reliability management, and maintenance optimization) Only 2% on the turbines were used for analytics, 98% of the data were never tapped into Needed a new data strategy, cost of doing traditional data rising Why Talend Provide data as a service, cafeteria style Integrate diverse data sets and compute at Big Data scale Lower cost to operate and reduced development efforts The Result 130+ applications feeding the system, 7 ERPs, >12M transactions/day 68 change data capture real-time streaming systems (from Sales, ERP systems) for real-time analytics 22,000 users on Big Data in 86+ countries, in self-service mode
  3. Convey the meaning You built the lake and you can get the value but you’re struggling Value of the DL The Data Lake metaphor arose because ‘lakes’ are a great concept to explain one of the basic tenets of Big Data. That is, the need to collect all of the data in the ecosystem ready to analyze it for pertinent patterns using all kinds of analysis, including autonomous machine learning. This is because, one of the basic tenets of data science is .. the more data you can get the better your analysis will ultimately be. Data Lake vs data warehouse Less construct vs more construct Cost reduction: EDW: 100 versus 1 for Hadoop For the same cost, organizations can now store 50 times as much data as in a Hadoop data lake than in a data warehouse. The IT team can only do so much (data ingestion, security, DQ..) True value when biz can access the data (self-service, improvemts in DQ, lineage, governance) Hadoop vs Data Lake (less construct, freedom to store anything, more volume, history, velocity) Provide a definition “Data lakes typically begin as ungoverned data stores. Meeting the needs of wider audiences requires curated repositories with governance, semantic consistency, and access controls.” Gartner Idle and overgrown, the data lake quickly will become a stagnant data swamp. But organizations can avoid data swamps by adding semantics to a data lake. semantics provides us with a highly usable and consistent taxonomy model for data lakes
  4. But .. The data lake brings new challenges…. This has lead to a new set of problems/challenges .. That focus on Trustworthiness and Ubiquity … Overwhelming amount of data Concerns about the data being accessed by individuals that shouldn’t due to the lack of tools Confusion around what data lies where Limited number of people able to access the data Limited understanding of where the data came from or what has been done with it Limited data quality Lead to data gridlock causing data lacks to fail at delivering on the true potential of the data lake
  5. Example: Betvictor = real-time customer engagement MUST HAVES: Wide connectivity Ubiquity of batch and streaming Uncomplicated management for a wide range of data types For discovery & prep we have Data Preparation. For curation we will add Data Stewardship in Winter. We can handle all formats including the most complex hierarchical ones with the Talend Data Mapper, which runs on Spark. PITFALLS: Hand coding  dev cost 20%, maintenance 40%, support 40%  blog by Ashley, based on “Does Custom-Coded Data Integration Stack Up to Tools?”[1], (Sept 5) ALSO: 20% initial but 200% increase of maintenance cost Standalone “specialized” disconnected tools
  6. Air France = DQ + Talend Metadata Manager MUST HAVES: Capture metadata, provenance & lineage Automate data tagging = data semantics & ML Collaborative stewardship & curation Preparation & improvement of the data Control data accessibility PITFALLS: Top-down governance Fragmented tooling = inconsistent governance framework Must have integration with the distros & Apache Top-down governance = hard & slow, unpractical Too fragmented tooling = inconsistent governance framework Major regulatory obligations e.g. GDPR Data accessibility = security -- Beyond Kerberos: kerberos is a given. We of course support Kerberos everywhere. But kerberos is not enough. You must plan for different granularity such as with Sentry. Or policy-based rules with Ranger. Data encryption such as with HDFS encryption. Or data masking of PIIs or sensitive data. TALEND SUPPORTS ALL OF THE ABOVE! The Profiler and Data Prep both use semantic analysis to understand the ‘meaning’ of the data and help identify sensitive data For auditability and lineage 1) our Studio is and always has been metadata driven 2) we are fully integrated with Navigator and Atlas 3) we can do Enterprise MM beyond Hadoop with TMM
  7. MUST HAVES: Unified framework for all data management tasks Single point of operationalization Scalable business model PITFALLS: Fragmented tools & hand-coding Isolated initiatives, shadow IT Unpredictable and exponential costs
  8. Ring Central = they don’t use Talend Data Prep (yet) but they deliver data in self service MUST HAVES: Data accessibility for everyone Self-service tools for everyone Scalable operationalization PITFALLS: Isolated, unmanaged tools Self-service tools only for the tech savvy GET READY FOR CHANGE Use future proof frameworks Continuous delivery Hybrid cloud Pitfalls: Hand-coding (again!) tied to a particular language / framework Fragmented, tactical approaches MUST HAVES: Abstraction layer to manage and leverage diversity, evolutions, innovations Continuous delivery of data and processes Hybrid on-prem / cloud PITFALLS: Hand-coding tied to a particular technology or language Fragmented approaches to leverage the diverse big data frameworks
  9. Big Data Big data and cloud innovations including Spark 2.0 = toward unification of batch & streaming Staying on the cutting edge of big data innovation, processing big data at the fastest speeds possible. Operationalize ML on Spark Data Preparation for Big Data lets anyone access and improve data Enables the information worker to turn data into insight at scale Enables the entire organization to access “trusted” data in the lake Cloud Data Prep as a Service Democratize Data ingestion via tools accessible to Data scientists, with a similar UX to what we presented earlier Self-Service New Data Stewardship App helps users make decision on data + orchestrate data governance between IT and business. It empowers the business to ensure data integrity at the source. Data Prep Self-service connectors: Big Data, Cloud, but applicative connectors too (SFDC, MKTO, …) Governance We talked a lot about it In Big Data, you have a lot of data, from which you have very little knowledge. Atlas integration provides traceability & lineage
  10. Free, zero risk, environment Evaluate The pros & cons of the various technologies Spark Batch & Spark Streaming Talend vs. hand coding Real-world scenarios Help plan for your data lake