SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Big Data Graph Analytics
Cristian Spigariol
Cloud Enterprise Architect
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Types of uses of Machine Learning in all Industries
Typical use case scenarios
Classification
(predict among a set of options)
• Find and preventing customer
churn
• Target the right customer with
the right offer
• Predict customer response to an
affinity card program
Regression
(estimate a missing value)
• predict how much a customer
will spend
Clustering
(find unknown patterns)
• Detect anomalous or suspicious
activities
Association Rules
(find correlations)
• Predict correlation among items
Graph Analysis
(understand interactions)
• Understanding influencers in
social networks
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Machine Learning through Graphs
• Graphs systems focus on relationships rather than entities
– They are key to understand highly connected systems and relative
behaviours (i. e. areas of strong/weak interaction) by examining how
relationships spread throughout the graph
• Graphs algorithms are self-consistent
– The answer to complex problems resides in how entities (nodes) interact
and not in the entities themself or in external resources
– Graph algorithms are effective even with graphs based on entities with few
properties
• Cover a broad range of applications
– Their simple and flexible data model is able to describe a broad range of
use cases, from financial systems, human neural networks to
infrastructural networks (transportation, telco, electricity)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Marketing Analyses using Graphs
Graph algorithms can strongly improve the effectiveness of marketing
analyses.
 In customer profiling we can extend the individual profile of a
given customer by considering his/her ability to influence the circle
of friends
 In marketing campaigns the identification of influencers can
amplify the echo of the relative promotional activities and increase
the conversion rate
 In marketing campaigns the identification of strongly connected
communities (people who interact on the basis of shared
behaviors) can be the the basis for customers segmentation.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Network Analysis using Graphs
Detection of weak links is aimed at identifying nodes in the
transportation/telco/energy network that have a high numbers
of flows that come through them and that are not balanced with
a proper number of alternative paths (Betweenness centrality).
Graphs algorithms are extremely useful to optimize network.
Network flows analysis consists in assigning to each connection
(i. e. link between two nodes) a capacity and evaluate the total
amount of flows that passes on it. The amount of flows on an
edge cannot exceed the capacity of the edge.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidentihal – Internal 6
Collaborative Filtering Pattern
Find out similarities
Select
potential
targets
Rank
output by
relevance
If a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a
different issue than that of a randomly chosen person (Wikipedia).
Find out people that
present the same
behavior with respect to
the person A.
Person A
Select the items chosen
by the people similar to
the person A (i. e.
potential targets).
Among the potential targets
weight the items that
present the highest
relevance rank.
Higher Rank
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Ricky
Simon
Lucia
Circle of trust
Ricky
0.4
Simon
0.3
Lucia
0.1
JohnMaria
...
...
...
By using a centrality algorithm (Personalized
PageRank) we can determine the most
influent people in the circle of connections
originated by Alice.
We move from similarities to trust!
Alice
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Prod#3
Ricky
Simon
Lucia
Circle of trust Targets
Prod#4
Prod#9
Prod#7
We determine the
potential targets
by selecting the
products already
boughts by the
trusted people
(bipartite graph).
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Prod#3
(2)
Ricky
Simon
Lucia
Circle of trust Targets
Prod#4
(1)
Prod#9
(1)
Prod#7
(2)
We start the
relevance algorithm
(salsa) by measuring
the relevance score,
that is the sum of the
preferences received
by each product.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Ricky
(2)
Simon
(4)
Lucia
(4)
Circle of trust Targets
We then walk
connections back-to-
front to measure
Hub Score as the
sum of the relevance
ranks of products
bought
Measure the ability of
each trusted person
to intercept the tastes
of the circle
Prod#3
(2)
Prod#4
(1)
Prod#9
(1)
Prod#7
(2)
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
Prod#3
(6)
Circle of trust Targets
Prod#4
(4)
Prod#7
(8)
Prod#9
(4)
The new relevance
score is measured
with the weighted
sum of the
preferences (hub
ranks) received.
Ricky
(2)
Simon
(4)
Lucia
(4)
The Prod#9 has the
highest likelihood to
be well-accepted by
Alice since it has been
chosen by the most
“knowledgeable”
trusted people
Alice
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Recommendation using Graphs Analytical Algorithms
The algorithm can be iterated
many times. Each iteration will
reinforce the rank score and
the relevance score.
The higher the number of
iterations the higher the
effectiveness of the algorithm
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graphs Analytical Algorithms – Possible use cases
This recommendation approach is at the basis of the WTF service at Twitter.
It can be proficiently be applied to different industries as for example:
 to recommend insurance policies based on the most relevant opinions
of “trusted“ people
 to up-sell telco services with the same trust+expertise approach.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Do it yourself? If so please consider...
Complexity Productivity
Architecture Integration
Not trivial algorithms, need domain
specific knowledge
Bug Fixing, Tuning for precision and
performance, Support
Graph algorithms need in-memory
parallel execution as well as a low-
latency NoSQL storage
You need to integrate your solution
with the Big Data cluster to feed
your Graph database
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
A rich set of built-in, parallel algorithms Parallel graph mutation operations
Detecting Components and
Communities
Tarjan’s, Kosaraju’s,
Weakly Connected
Components, Label
Propagation (w/ variants),
Spasification
Ranking and Walking
Pagerank, Personalized
Pagerank,
Betwenness Centrality (w/ variants),
Closeness Centrality, Degree
Centrality,
Eigenvector Centrality, HITS,
Random walking and sampling (w/
variants)
Evaluating Community Structures
∑ ∑
Conductance,
Modularity
Clustering Coefficient
(Triangle Counting)
Path-Finding
Hop-Distance (BFS)
Dijkstra’s,
Bi-directional Dijkstra’s
Bellman-Ford’s
Link Prediction SALSA
(Twitter’s Who-to-follow)
Other Classics Vertex Cover
a
d
b e
g
c i
f
h
The original graph
a
d
b e
g
c i
f
h
Undirected Graph
Simplify Graph
a
d
b e
g
c i
f
h
Left Set: “a,b,e”
a d
b
e
g
c
i
Bipartite
Graph
ge b d i a f c h
Sort-By-Degree (Renumbering)
Filtered
Subgraph
d
b
g
i
e
15
Oracle Big Data Spatial and Graph – Memory Analyst
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Machine Learning is Data Driven
• Data is the fuel of ML
algorithms
• The effectiveness of ML
algorithms is strictly tied to the
amount of available
data
• To translate ML results into a
competitive advantage we need
a paradigm shift in the way
information management
solution are designed and
managed.
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal 17
A Data Driven Strategy with Machine Learning
Adopt
Standards
Don’t
Move data
Broadcast
ML results
Experiment
and Act
New
Paradigm
• Data is heavy – don’t move data
• Move elaboration to data instead
• Reduce the complexity
• Facilitate integration
• Speak the language of Data
Scientists (R, Python, Scala,
Spark, Gremlin)
• Take advantage of new ML
packages release (e. g. CRAN,
MLlib)
• Define your models (Lab) and
then move them in the
mainstream (Prod)
• Score your models continuoulsly
(both in batch and in streming)
• Take them up to date
• Spread ML results thorughout
the user communities
• Predictions are new inputs for in-
place processes or analyses
(additional KPIs, properties, etc..)
Big Data Graph Analytics

Contenu connexe

Tendances

Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...ErhardRahm
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jijtsrd
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousNeo4j
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1TigerGraph
 
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2TigerGraph
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j
 
DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...
DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...
DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...DataStax
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AINeo4j
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data ScienceNeo4j
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
 
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxImprove ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxNeo4j
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Sabri Skhiri
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsNeo4j
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
 
Visual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesVisual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesAsliza Hamzah
 

Tendances (20)

Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
Big Data Analytics With MATLAB
Big Data Analytics With MATLABBig Data Analytics With MATLAB
Big Data Analytics With MATLAB
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
Detecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and LinkuriousDetecting eCommerce Fraud with Neo4j and Linkurious
Detecting eCommerce Fraud with Neo4j and Linkurious
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
 
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...
DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...
DataStax | Meaningful User Experience with Graph Data (Chris Lacava, Expero) ...
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptxImprove ml predictions using graph algorithms (webinar july 23_19).pptx
Improve ml predictions using graph algorithms (webinar july 23_19).pptx
 
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016Workshop on Real-time & Stream Analytics IEEE BigData 2016
Workshop on Real-time & Stream Analytics IEEE BigData 2016
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
 
Visual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challengesVisual analysis of large graphs state of the art and future research challenges
Visual analysis of large graphs state of the art and future research challenges
 

Similaire à Big Data Graph Analytics

Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdfAutomatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf4dalert
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableauPhillip Reinhart
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsLinkurious
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Leveraging Graphs for AI and ML - Alicia Frame, Neo4j
Leveraging Graphs for AI and ML - Alicia Frame, Neo4jLeveraging Graphs for AI and ML - Alicia Frame, Neo4j
Leveraging Graphs for AI and ML - Alicia Frame, Neo4jNeo4j
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuideAivada
 
Oracle Analytics Cloud (1).pptx
Oracle Analytics Cloud (1).pptxOracle Analytics Cloud (1).pptx
Oracle Analytics Cloud (1).pptxRichardGrayson13
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaWeb à Québec
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Predictive Analytics Glossary
Predictive Analytics GlossaryPredictive Analytics Glossary
Predictive Analytics GlossaryAlgolytics
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jNeo4j
 
Technology Governance & Migration In The AI Era
Technology Governance & Migration In The AI EraTechnology Governance & Migration In The AI Era
Technology Governance & Migration In The AI Era2toLead Limited
 
bookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdfbookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdf13DikshaDatir
 

Similaire à Big Data Graph Analytics (20)

Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdfAutomatic Data Reconciliation, Data Quality, and Data Observability.pdf
Automatic Data Reconciliation, Data Quality, and Data Observability.pdf
 
A guide to preparing your data for tableau
A guide to preparing your data for tableauA guide to preparing your data for tableau
A guide to preparing your data for tableau
 
Using Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projectsUsing Linkurious in your Enterprise Architecture projects
Using Linkurious in your Enterprise Architecture projects
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Leveraging Graphs for AI and ML - Alicia Frame, Neo4j
Leveraging Graphs for AI and ML - Alicia Frame, Neo4jLeveraging Graphs for AI and ML - Alicia Frame, Neo4j
Leveraging Graphs for AI and ML - Alicia Frame, Neo4j
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Python for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive GuidePython for Data Analysis: A Comprehensive Guide
Python for Data Analysis: A Comprehensive Guide
 
SegmentOfOne
SegmentOfOneSegmentOfOne
SegmentOfOne
 
Oracle Analytics Cloud (1).pptx
Oracle Analytics Cloud (1).pptxOracle Analytics Cloud (1).pptx
Oracle Analytics Cloud (1).pptx
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
There’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo AhavaThere’s data everywhere! - Simo Ahava
There’s data everywhere! - Simo Ahava
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
 
Predictive Analytics Glossary
Predictive Analytics GlossaryPredictive Analytics Glossary
Predictive Analytics Glossary
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
 
Technology Governance & Migration In The AI Era
Technology Governance & Migration In The AI EraTechnology Governance & Migration In The AI Era
Technology Governance & Migration In The AI Era
 
bookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdfbookrecommendations-230615063942-3b1016c9 (1).pdf
bookrecommendations-230615063942-3b1016c9 (1).pdf
 

Dernier

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Dernier (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Big Data Graph Analytics

  • 1. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Big Data Graph Analytics Cristian Spigariol Cloud Enterprise Architect
  • 2. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Types of uses of Machine Learning in all Industries Typical use case scenarios Classification (predict among a set of options) • Find and preventing customer churn • Target the right customer with the right offer • Predict customer response to an affinity card program Regression (estimate a missing value) • predict how much a customer will spend Clustering (find unknown patterns) • Detect anomalous or suspicious activities Association Rules (find correlations) • Predict correlation among items Graph Analysis (understand interactions) • Understanding influencers in social networks
  • 3. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Machine Learning through Graphs • Graphs systems focus on relationships rather than entities – They are key to understand highly connected systems and relative behaviours (i. e. areas of strong/weak interaction) by examining how relationships spread throughout the graph • Graphs algorithms are self-consistent – The answer to complex problems resides in how entities (nodes) interact and not in the entities themself or in external resources – Graph algorithms are effective even with graphs based on entities with few properties • Cover a broad range of applications – Their simple and flexible data model is able to describe a broad range of use cases, from financial systems, human neural networks to infrastructural networks (transportation, telco, electricity)
  • 4. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Marketing Analyses using Graphs Graph algorithms can strongly improve the effectiveness of marketing analyses.  In customer profiling we can extend the individual profile of a given customer by considering his/her ability to influence the circle of friends  In marketing campaigns the identification of influencers can amplify the echo of the relative promotional activities and increase the conversion rate  In marketing campaigns the identification of strongly connected communities (people who interact on the basis of shared behaviors) can be the the basis for customers segmentation.
  • 5. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Network Analysis using Graphs Detection of weak links is aimed at identifying nodes in the transportation/telco/energy network that have a high numbers of flows that come through them and that are not balanced with a proper number of alternative paths (Betweenness centrality). Graphs algorithms are extremely useful to optimize network. Network flows analysis consists in assigning to each connection (i. e. link between two nodes) a capacity and evaluate the total amount of flows that passes on it. The amount of flows on an edge cannot exceed the capacity of the edge.
  • 6. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidentihal – Internal 6 Collaborative Filtering Pattern Find out similarities Select potential targets Rank output by relevance If a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person (Wikipedia). Find out people that present the same behavior with respect to the person A. Person A Select the items chosen by the people similar to the person A (i. e. potential targets). Among the potential targets weight the items that present the highest relevance rank. Higher Rank
  • 7. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Recommendation using Graphs Analytical Algorithms Ricky Simon Lucia Circle of trust Ricky 0.4 Simon 0.3 Lucia 0.1 JohnMaria ... ... ... By using a centrality algorithm (Personalized PageRank) we can determine the most influent people in the circle of connections originated by Alice. We move from similarities to trust! Alice
  • 8. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Recommendation using Graphs Analytical Algorithms Prod#3 Ricky Simon Lucia Circle of trust Targets Prod#4 Prod#9 Prod#7 We determine the potential targets by selecting the products already boughts by the trusted people (bipartite graph).
  • 9. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Recommendation using Graphs Analytical Algorithms Prod#3 (2) Ricky Simon Lucia Circle of trust Targets Prod#4 (1) Prod#9 (1) Prod#7 (2) We start the relevance algorithm (salsa) by measuring the relevance score, that is the sum of the preferences received by each product.
  • 10. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Recommendation using Graphs Analytical Algorithms Ricky (2) Simon (4) Lucia (4) Circle of trust Targets We then walk connections back-to- front to measure Hub Score as the sum of the relevance ranks of products bought Measure the ability of each trusted person to intercept the tastes of the circle Prod#3 (2) Prod#4 (1) Prod#9 (1) Prod#7 (2)
  • 11. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Recommendation using Graphs Analytical Algorithms Prod#3 (6) Circle of trust Targets Prod#4 (4) Prod#7 (8) Prod#9 (4) The new relevance score is measured with the weighted sum of the preferences (hub ranks) received. Ricky (2) Simon (4) Lucia (4) The Prod#9 has the highest likelihood to be well-accepted by Alice since it has been chosen by the most “knowledgeable” trusted people Alice
  • 12. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Recommendation using Graphs Analytical Algorithms The algorithm can be iterated many times. Each iteration will reinforce the rank score and the relevance score. The higher the number of iterations the higher the effectiveness of the algorithm
  • 13. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Graphs Analytical Algorithms – Possible use cases This recommendation approach is at the basis of the WTF service at Twitter. It can be proficiently be applied to different industries as for example:  to recommend insurance policies based on the most relevant opinions of “trusted“ people  to up-sell telco services with the same trust+expertise approach.
  • 14. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Do it yourself? If so please consider... Complexity Productivity Architecture Integration Not trivial algorithms, need domain specific knowledge Bug Fixing, Tuning for precision and performance, Support Graph algorithms need in-memory parallel execution as well as a low- latency NoSQL storage You need to integrate your solution with the Big Data cluster to feed your Graph database
  • 15. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. A rich set of built-in, parallel algorithms Parallel graph mutation operations Detecting Components and Communities Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propagation (w/ variants), Spasification Ranking and Walking Pagerank, Personalized Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants) Evaluating Community Structures ∑ ∑ Conductance, Modularity Clustering Coefficient (Triangle Counting) Path-Finding Hop-Distance (BFS) Dijkstra’s, Bi-directional Dijkstra’s Bellman-Ford’s Link Prediction SALSA (Twitter’s Who-to-follow) Other Classics Vertex Cover a d b e g c i f h The original graph a d b e g c i f h Undirected Graph Simplify Graph a d b e g c i f h Left Set: “a,b,e” a d b e g c i Bipartite Graph ge b d i a f c h Sort-By-Degree (Renumbering) Filtered Subgraph d b g i e 15 Oracle Big Data Spatial and Graph – Memory Analyst
  • 16. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Machine Learning is Data Driven • Data is the fuel of ML algorithms • The effectiveness of ML algorithms is strictly tied to the amount of available data • To translate ML results into a competitive advantage we need a paradigm shift in the way information management solution are designed and managed.
  • 17. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal 17 A Data Driven Strategy with Machine Learning Adopt Standards Don’t Move data Broadcast ML results Experiment and Act New Paradigm • Data is heavy – don’t move data • Move elaboration to data instead • Reduce the complexity • Facilitate integration • Speak the language of Data Scientists (R, Python, Scala, Spark, Gremlin) • Take advantage of new ML packages release (e. g. CRAN, MLlib) • Define your models (Lab) and then move them in the mainstream (Prod) • Score your models continuoulsly (both in batch and in streming) • Take them up to date • Spread ML results thorughout the user communities • Predictions are new inputs for in- place processes or analyses (additional KPIs, properties, etc..)