SlideShare une entreprise Scribd logo
1  sur  25
Data Organization & Big Data Architecture
 Data Organization
 Big Data Architecture
 Recruitment
Agenda
Data Organization
Line Of Business
HR Finance Sales Customers
Competitors Markets Products Supply
Trafic
Acquisition
Communication Security Prospects
* If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
Use Line Of Business
•LOB 1
( Customer )
BI Team
DataScience
Team
LOB 2
( Support )
BI Team
DataScience
Team
LOB 3
…
BI Team
DataScience
Team
Data Office
Data
Centralization
Datalake
Cleansing
Data
Integration
Data Office
CRM
BI Team
Data Science
Team
• ExtractsData
Analyst
•Events
•Actions
Customer
Animation
•Product Analysis
•Global AnalysisBUS
•Country Analysis
SUBS
•PAC
•Analyse AdhocDigital
•Onsite
•PartnerBIZDEV
•Campaigns
•Text mining
Trafic
Acquistion
•Segmentation
•Normalisation
Targeting
Channel
Incaseyoumisseditonthepreviousslide,ifyouworkinthedatafield,
weareinterestedinyourprofile!
Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2:
Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Data Maturity
Level 1:
POC
Data are manually created or extracted once
Data are modified by one data scientist
Data are assessed by a data analyst and manually sent to a business analyst post control
Level 2:
Manual
Data are manually created on a regular basis
Data are manually added to the enterprise model with an automated process
Data can be used by all data scientists, data analysts or business analysts
Level 3:
Automatic
Data are created through a controlled business process
Data are automatically added to the enterprise model
Data can be used by all data scientists, data analysts or business analysts
Data Maturity Matrix
Customers Competitors Products
Advanced 5 Potential Strategy
4 Attrition New Product
3 Churn Rank
2 Adds Event
Basic 1 NIC Pricing …
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data Scientists
Data Analysts
Business Analysts
Technical model
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Management Team ( Architect + Data Integrator )
Business Intelligence Team
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Business Intelligence Team
POC
Expose
POC
POC Mode
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
DataCommitee
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model Building
Datamart and report
building
Business Intelligence Team
DTM
Data Prepare:
industrialise
POC
Datastore 360
Level 2 & 3
mode
Expose
POC
Entreprise model
POC Mode
Data Lake Team
Tool / Infrastructure
Exploration : Code First Industrialisation : Model first
Data preparation :
80%
Data Scientists
Data Analysts
Business Analysts
Technical model
Machine
Learning :
20%
Analyse
Test
Validation
Data Analysis /
Creation
Data
Analysis
DataCommitee
Data Management Team ( Architect + Data Integrator )
DataViz
Model
Enterprise Model Building
Datamart and report
building
Business Intelligence Team
DTM
Data Prepare:
industrialise
Build Datamart and
Dashboard
POC
Datastore 360
Expose
POC
Entreprise model
POC Mode
Level 2 & 3
mode
Data Lake Team
Data Commitee
 Define data that needs to be added to
enterprise data
 Define priority and owners by subject
 Industrialise New data production : from
excel to full business process
 Validate enterprise model
– Common vocabulary
– Business and/or Functional model
 Be informed of evolution
Participant
 Data Scientist
 Data Analyst
 Business Analyst
 Data Management Team
Periodicity
 Every month
Objectives
Datastore 360
EDS 360
History
 Get all data from
– Front office application
– Back Office Application
– External Data
 Stores data in a business oriented model
 Responsable to historize data when this makes
sense for the business
– What data do we want to keep ? What will I need in 20 years ?
 Expose data to all application that requires it
– Business Intelligence : reporting or datamart
– Front office Application
Current
Client Produit Activity
Client Produit Activity
…
…
Data Scientist
Data Analyst
Business Analyst
DataViz
User APPs
(CRM,
Support
api
api Direct
read
Big Data Architecture
Context
~ 50 Replicas SQL
~ 700 DB
~ 300K tables
~ 100TB
~ 500K events/s
Datalake Hardware view
Private network
OVH Dedicated server
OVH Public Cloud High scalability
Security
Performance
Reliability
Lille Grand Palais – 28 Février 2017
Datalake software view
Pig
Flink
Spark
HDFS
HBase
Phoenix
Kafka
(Queue)Couch
Base
Jobs
Job Skills Output
Data Analyst Excel
Dataviz : Tableau, PowerBI
Data strategy
Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns,
Models
Data Integrator Flink, Hbase, Pig, Spark Data preparation
Data Dev Ops Kafka, Hbase, Go, Apache
Beam, …
Datalake
Thank you !
Join us : ovh.com/fr/careers

Contenu connexe

Tendances

OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuDataiku
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaObjectRocket
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchTO THE NEW | Technology
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandOntotext
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Imply
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformOntotext
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps Ontotext
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryMark Grover
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryChris Schalk
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 

Tendances (20)

OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
GraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on DemandGraphDB Cloud: Enterprise Ready RDF Database on Demand
GraphDB Cloud: Enterprise Ready RDF Database on Demand
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Smarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing PlatformSmarter content with a Dynamic Semantic Publishing Platform
Smarter content with a Dynamic Semantic Publishing Platform
 
How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps How to migrate to GraphDB in 10 easy to follow steps
How to migrate to GraphDB in 10 easy to follow steps
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQueryIntro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 

Similaire à Meetup Data-science OVH

Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steerAndy Steer
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classmcAnalytics99
 
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docxProject Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docxwkyra78
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy Hussain Sultan
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics WebinarEckerson Group
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveRon Krzoska
 
Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°40° Labor für Innovation
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkkguest4e975e2
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionProvectus
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 

Similaire à Meetup Data-science OVH (20)

Agile BI success factors
Agile BI success factorsAgile BI success factors
Agile BI success factors
 
Strategy session 5 - unlocking the data dividend - andy steer
Strategy   session 5 - unlocking the data dividend - andy steerStrategy   session 5 - unlocking the data dividend - andy steer
Strategy session 5 - unlocking the data dividend - andy steer
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
M Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson classM Chambers and RapidMiner Overview for Babson class
M Chambers and RapidMiner Overview for Babson class
 
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docxProject Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
Project Deliverable 1 Project Plan InceptionBy Justin M. Bla.docx
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy
 
Big Data Analytics Webinar
Big Data Analytics WebinarBig Data Analytics Webinar
Big Data Analytics Webinar
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
The Manulife Journey
The Manulife JourneyThe Manulife Journey
The Manulife Journey
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
Thought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserveThought leadership Oct2015 selfserve
Thought leadership Oct2015 selfserve
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°Predictive Data Analytics and Artificial Intelligence by 40°
Predictive Data Analytics and Artificial Intelligence by 40°
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 
Big Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems EvolutionBig Data Meetup: Analytical Systems Evolution
Big Data Meetup: Analytical Systems Evolution
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 

Plus de Vincent Terrasi

IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOVincent Terrasi
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a mentiVincent Terrasi
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOVincent Terrasi
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...Vincent Terrasi
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEOVincent Terrasi
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Vincent Terrasi
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsVincent Terrasi
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Vincent Terrasi
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Vincent Terrasi
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHVincent Terrasi
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?Vincent Terrasi
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projectsVincent Terrasi
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?Vincent Terrasi
 
Analyse your SEO Data with R and Kibana
Analyse your SEO Data with R and KibanaAnalyse your SEO Data with R and Kibana
Analyse your SEO Data with R and KibanaVincent Terrasi
 

Plus de Vincent Terrasi (14)

IA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEOIA générative : Menace ou Opportunité pour le SEO
IA générative : Menace ou Opportunité pour le SEO
 
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a mentislides SEO CAMP'us Paris 2022 - Google et tools SEO  On vous a menti
slides SEO CAMP'us Paris 2022 - Google et tools SEO On vous a menti
 
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEOUne IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
Une IA pour votre SEO, une méthode inédite pour accélérer vos projets Data SEO
 
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
SEO AnswerBox, une méthode inédite pour interroger vos données et créer vos d...
 
Génération de contenu pour le SEO
Génération de contenu pour le SEOGénération de contenu pour le SEO
Génération de contenu pour le SEO
 
Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?Comment faire du Data SEO sans savoir programmer ?
Comment faire du Data SEO sans savoir programmer ?
 
Explainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking FactorsExplainable Machine Learning for Ranking Factors
Explainable Machine Learning for Ranking Factors
 
Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !Fausses données et Bad Data : restez vigilant !
Fausses données et Bad Data : restez vigilant !
 
Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?Comment les plateformes de Data Science métamorphosent le SEO ?
Comment les plateformes de Data Science métamorphosent le SEO ?
 
Find out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVHFind out how DataScience has revolutionized SEO for OVH
Find out how DataScience has revolutionized SEO for OVH
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
How to automate all your SEO projects
How to automate all your SEO projectsHow to automate all your SEO projects
How to automate all your SEO projects
 
How Data Science can boost your SEO ?
How Data Science can boost your SEO ?How Data Science can boost your SEO ?
How Data Science can boost your SEO ?
 
Analyse your SEO Data with R and Kibana
Analyse your SEO Data with R and KibanaAnalyse your SEO Data with R and Kibana
Analyse your SEO Data with R and Kibana
 

Dernier

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Dernier (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

Meetup Data-science OVH

  • 1. Data Organization & Big Data Architecture
  • 2.  Data Organization  Big Data Architecture  Recruitment Agenda
  • 4. Line Of Business HR Finance Sales Customers Competitors Markets Products Supply Trafic Acquisition Communication Security Prospects * If you read this text, work in the data field and are interested in joining us, please go to: https://www.ovh.com/fr/careers/
  • 5. Use Line Of Business •LOB 1 ( Customer ) BI Team DataScience Team LOB 2 ( Support ) BI Team DataScience Team LOB 3 … BI Team DataScience Team
  • 6. Data Office Data Centralization Datalake Cleansing Data Integration Data Office CRM BI Team Data Science Team • ExtractsData Analyst •Events •Actions Customer Animation •Product Analysis •Global AnalysisBUS •Country Analysis SUBS •PAC •Analyse AdhocDigital •Onsite •PartnerBIZDEV •Campaigns •Text mining Trafic Acquistion •Segmentation •Normalisation Targeting Channel Incaseyoumisseditonthepreviousslide,ifyouworkinthedatafield, weareinterestedinyourprofile!
  • 7. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control
  • 8. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control Level 2: Manual Data are manually created on a regular basis Data are manually added to the enterprise model with an automated process Data can be used by all data scientists, data analysts or business analysts
  • 9. Data Maturity Level 1: POC Data are manually created or extracted once Data are modified by one data scientist Data are assessed by a data analyst and manually sent to a business analyst post control Level 2: Manual Data are manually created on a regular basis Data are manually added to the enterprise model with an automated process Data can be used by all data scientists, data analysts or business analysts Level 3: Automatic Data are created through a controlled business process Data are automatically added to the enterprise model Data can be used by all data scientists, data analysts or business analysts
  • 10. Data Maturity Matrix Customers Competitors Products Advanced 5 Potential Strategy 4 Attrition New Product 3 Churn Rank 2 Adds Event Basic 1 NIC Pricing …
  • 11. Exploration : Code First Industrialisation : Model first Data Scientists Data Analysts Business Analysts Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team Data Lake Team
  • 12. Data Lake Team Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data Scientists Data Analysts Business Analysts Technical model Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team
  • 13. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Management Team ( Architect + Data Integrator ) Business Intelligence Team Data Lake Team
  • 14. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis Data Management Team ( Architect + Data Integrator ) DataViz Model Business Intelligence Team POC Expose POC POC Mode Data Lake Team
  • 15. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis DataCommitee Data Management Team ( Architect + Data Integrator ) DataViz Model Enterprise Model Building Datamart and report building Business Intelligence Team DTM Data Prepare: industrialise POC Datastore 360 Level 2 & 3 mode Expose POC Entreprise model POC Mode Data Lake Team
  • 16. Tool / Infrastructure Exploration : Code First Industrialisation : Model first Data preparation : 80% Data Scientists Data Analysts Business Analysts Technical model Machine Learning : 20% Analyse Test Validation Data Analysis / Creation Data Analysis DataCommitee Data Management Team ( Architect + Data Integrator ) DataViz Model Enterprise Model Building Datamart and report building Business Intelligence Team DTM Data Prepare: industrialise Build Datamart and Dashboard POC Datastore 360 Expose POC Entreprise model POC Mode Level 2 & 3 mode Data Lake Team
  • 17. Data Commitee  Define data that needs to be added to enterprise data  Define priority and owners by subject  Industrialise New data production : from excel to full business process  Validate enterprise model – Common vocabulary – Business and/or Functional model  Be informed of evolution Participant  Data Scientist  Data Analyst  Business Analyst  Data Management Team Periodicity  Every month Objectives
  • 18. Datastore 360 EDS 360 History  Get all data from – Front office application – Back Office Application – External Data  Stores data in a business oriented model  Responsable to historize data when this makes sense for the business – What data do we want to keep ? What will I need in 20 years ?  Expose data to all application that requires it – Business Intelligence : reporting or datamart – Front office Application Current Client Produit Activity Client Produit Activity … … Data Scientist Data Analyst Business Analyst DataViz User APPs (CRM, Support api api Direct read
  • 20. Context ~ 50 Replicas SQL ~ 700 DB ~ 300K tables ~ 100TB ~ 500K events/s
  • 21. Datalake Hardware view Private network OVH Dedicated server OVH Public Cloud High scalability Security Performance Reliability
  • 22. Lille Grand Palais – 28 Février 2017
  • 24. Jobs Job Skills Output Data Analyst Excel Dataviz : Tableau, PowerBI Data strategy Data Scientist Scala, Java, R, Python, Cube Datasets, Flows, Patterns, Models Data Integrator Flink, Hbase, Pig, Spark Data preparation Data Dev Ops Kafka, Hbase, Go, Apache Beam, … Datalake
  • 25. Thank you ! Join us : ovh.com/fr/careers

Notes de l'éditeur

  1. A secured cluster accessible through a gateaway Computing layer is based on Public cloud instances in order to scale fastly On the other hand Cold Storage is based on dedicated server for higher performances Technologie vRACK pour le réseau dédié Public Cloud pour la scalabilité
  2. A secured cluster accessible through a gateaway Computing layer is based on Public cloud instances in order to scale fastly On the other hand Cold Storage is based on dedicated server for higher performances Technologie vRACK pour le réseau dédié Public Cloud pour la scalabilité -> datanode
  3. Hadoop ecosystem with HDFS for data storage, Hbase plus phoenix for SQL support on columnar storage -> Relationnal data storage layer CouchBase for document data storage. Key, value can either be stored into HDFS or couchbase depending on their access rate Processing is made by Spark / Flink / Pig. Each of these solution has its strong points, but spark and flink may be abstracted as a apache Beam layer in incoming versions.