SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Demo-Driven
Graph Analytics
Rich Relationships =
Powerful Insights
John Hebeler, PhD
jhebeler@gmail.com
June 17, 2021
All Data is really a Graph...
• Classics
• Seven Bridges of Königsberg
• Traveling Salesman
• "Data" Networks
• Computer
• Social
• Maps
• Internet
• Value
• Sometimes Relationshipsalone provide
insights
• Reveals Powerful patterns
• Analytics enrich the possibilities
• Extends all the way to deep machine
learning
• Let's Get Started...
2
(c) John Hebeler 2021
Graph
Components
• Node (Label)
• Country
• Hierarchy
• Edge (Relationship)
• LOCATED_IN
• Can have Direction
• Properties
• StartDate
• Node and/or Edge Resident
• Metadata
StartDate
Name
Node
Edge w Dir
Property
3
(c) John Hebeler 2021
(c) John Hebeler 2021 4
Graph Attributes
• Degree & Diameter
• Algorithms
• Path Finding
• Cycle Detection
• Trees
• Centrality
• Community
• Recommendations/Similarity
• Node2Vector, GraphSAGE,...
• Characteristics
• Weighted
• Sparse/Dense
• Cyclic
• Directed/Undirected
• Distance
• Neighbors
Graph_properties.ipynb
5
(c) John Hebeler 2021
A Graph Tool Kit
(c) John Hebeler 2021 6
Property vs Knowledge Graphs
Property Graph
• Basic Node-Edge-Property
• No Formal Schema
• Cypher, Tinkerpop/Gremlin
Knowledge Graph
• Schema
• Can describe real-world entities
• Hierarchical Classes, Containment,
Constraints, Rules, Equivalence...
• Ontologies (RDF/OWL)
• Incrementally Expressive
• SPARQLQuery Language
• Select and Construct
• Reasoner-Enabled
• Derives new assertions
• Validates current assertions
• Enables entry verification
Car
Truck
Vehicle
Type of Type of
Car ID:
567
Is A
Labels:
Vehicle, Car
Properties
* Type: Honda Accord
* VIN: 1234
Honda
Accord
Car model
1234
VIN
7
(c) John Hebeler 2021
Graph
Analytic
Pipeline
(c) John Hebeler 2021
8
Applied
Neo4J Graph
Analytics
Cypher Language
Movie Search
Fraud Detection
Russian Trolls
Machine Learning
9
(c) John Hebeler 2021
Applied
Wikidata
and SPARQL
SPARQL Language
Presidents
Export/Import
10
(c) John Hebeler 2021
Applied
Nvidia
CudaGraph
Simple Graph
Community
Centrality
11
(c) John Hebeler 2021
Applied
AWS
Neptune
Gremlin Language
AWS Data
Fraud Detection
12
(c) John Hebeler 2021
Applied
DASK
Manage large data
tasks
Dask API
Parallel Processing
Analytics
13
(c) John Hebeler 2021
Summary
•Powerful Data Analysis
•Significant Role Today
•Start Exploring…
Encourage Questions/Concerns:
jhebeler@gmail.com
14
(c) John Hebeler 2021
Reference Slides
Hand outs
15
(c) John Hebeler 2021
Getting Started with Neo4J
• Open Source Community Version with DataScience Extensions
• Neo4J: https://neo4j.com/
• DownloadContainer: https://hub.docker.com/_/neo4j
$ docker pull neo4j
$ docker run 
--publish=7474:7474 --publish=7687:7687  # can use others too - 7687 interferes with VMWare
--volume=$HOME/neo4j/data:/data --volume=$HOME/neo4j/import:/import
• Add GDS (https://neo4j.com/download-center/#algorithms)and APOC Methods
• Load Functionality
• $ cp neo4j-graph-data-science-x.x.x.jar to $NEO4J_HOME/plugins
• $ cp apoc-4.2.0.1-core.jar from $NEO4J_HOME/labs to $NEO4J_HOME/plugins
• Update Configuration in $NEO4J_HOME (/var/lib/neo4j)/conf/neo4j.conf
• dbms.memory.heap.initial_size=512m
• dbms.memory.heap.max_size=5G
• dbms.security.procedures.unrestricted=gds.*,apoc.*
• dbms.security.procedures.whitelist=gds.*,apoc.*
• CYPHER Graph Query Language: https://neo4j.com/developer/cypher/intro-cypher/
• Import from popularfomats (csv, json, …)
16
(c) John Hebeler 2021
Getting Started with WikiData and Sparql
• All (almost) of Wikipedia as a Graph
• https://www.wikidata.org/wiki/Wikidata:Main_Page
• Sparql Overview
• https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/
• https://query.wikidata.org/ (use the query helper)
• Presidental Demonstration
FILTER: instance of human
position held President of the United States
SHOW: date of birth
child
spouse
ADD: ORDER BY DESC (?date_of_birth)
• Can export findings to common formats (csv, json,…)
17
(c) John Hebeler 2021
Getting Started with Nvidia CudaGraph
• Manages large data sets across multiple CPUs/Cores
• Contains all major graph analytic libraries
• Basic numeric graph data (must preprocess most data)
1 2
1 3
2 4
• Download Cuda Container
• https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai
$ docker pull nvcr.io/nvidia/rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04
$ docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 
nvcr.io/nvidia/rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04
• Contains working notebooks in cudagraph
18
(c) John Hebeler 2021
Getting Started with AWS Neptune
• Obtain an AWS Account
• Become Familiar with TinkerPop/Gremlin
• https://tinkerpop.apache.org/
• Create Neptune Database from AWS Console
• Interact with SPARQL endpoint or Apache TinkerPop™ Gremlin
Websockets Server
19
(c) John Hebeler 2021
Getting Started with DASK
• Manageslarge data sets across multiple CPUs/Cores
• Python Library
• Installwith Anaconda ordirect with pip install
• Start up GUI
from dask.distributed import Client
client = Client(n_workers=1, threads_per_worker=4, processes=False, memory_limit='6GB')
Available on port 8787 by default
• ReadinLarge Data File
• import dask.dataframe asdd
• df= dd.read_csv(...)
• df.x.sum().compute() #This uses the single-machine scheduler by default
• from dask.distributed importClient
• client =Client(...) #Connect todistributed cluster and override default
• df.x.sum().compute() #This now runs on the distributedsystem
• FilterData Set
• reducedData =dataInput['user'].isin(searchList)
• Simplify/Restructure the data
• from urllib.parseimport urlparse
• http_admin['urlbrief']=http_admin['url'].map(lambda x: urlparse(x).netloc, meta=('new_col', 'object') )
• http_admin.compute()
• http_admin.head()
• Drop unwantedcolumns
• http_small =http_admin.drop(['url','activity', 'content'], axis=1)
• #createonelargecsv filewith the listed collumns
• http_small[['user','pc','urlbrief']].to_csv("http_admin_brief2.csv", single_file=True)
20
(c) John Hebeler 2021
References
• Books
•Graph Databases: Ian Robinson...
•Graph Algorithms: Mark Needham
•Graph Analytics with Neo4j: E Scifo
• Sites
•neo4j.com (also https://sandbox.neo4j.com/?usecase=graph-data-science)
•aws.amazon.com/neptune
•tinkerpop.apache.org
•wikidata.org
•dask.org
21
(c) John Hebeler 2021

Contenu connexe

Plus de Data Works MD

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningData Works MD
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData Works MD
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Data Works MD
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements LabelingData Works MD
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsData Works MD
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...Data Works MD
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataData Works MD
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelData Works MD
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthData Works MD
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis WorkshopData Works MD
 

Plus de Data Works MD (11)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in BaltimoreData in the City: Analytics and Civic Data in Baltimore
Data in the City: Analytics and Civic Data in Baltimore
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Automated Software Requirements Labeling
Automated Software Requirements LabelingAutomated Software Requirements Labeling
Automated Software Requirements Labeling
 
Introduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application InsightsIntroduction to Elasticsearch for Business Intelligence and Application Insights
Introduction to Elasticsearch for Business Intelligence and Application Insights
 
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
An Asynchronous Distributed Deep Learning Based Intrusion Detection System fo...
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Two Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG DataTwo Algorithms for Weakly Supervised Denoising of EEG Data
Two Algorithms for Weakly Supervised Denoising of EEG Data
 
Detecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph KernelDetecting Lateral Movement with a Compute-Intense Graph Kernel
Detecting Lateral Movement with a Compute-Intense Graph Kernel
 
Predictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood HealthPredictive Analytics and Neighborhood Health
Predictive Analytics and Neighborhood Health
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 

Dernier

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Dernier (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Graph Analytics Rich Relationships = Powerful Insights

  • 1. Demo-Driven Graph Analytics Rich Relationships = Powerful Insights John Hebeler, PhD jhebeler@gmail.com June 17, 2021
  • 2. All Data is really a Graph... • Classics • Seven Bridges of Königsberg • Traveling Salesman • "Data" Networks • Computer • Social • Maps • Internet • Value • Sometimes Relationshipsalone provide insights • Reveals Powerful patterns • Analytics enrich the possibilities • Extends all the way to deep machine learning • Let's Get Started... 2 (c) John Hebeler 2021
  • 3. Graph Components • Node (Label) • Country • Hierarchy • Edge (Relationship) • LOCATED_IN • Can have Direction • Properties • StartDate • Node and/or Edge Resident • Metadata StartDate Name Node Edge w Dir Property 3 (c) John Hebeler 2021
  • 5. Graph Attributes • Degree & Diameter • Algorithms • Path Finding • Cycle Detection • Trees • Centrality • Community • Recommendations/Similarity • Node2Vector, GraphSAGE,... • Characteristics • Weighted • Sparse/Dense • Cyclic • Directed/Undirected • Distance • Neighbors Graph_properties.ipynb 5 (c) John Hebeler 2021
  • 6. A Graph Tool Kit (c) John Hebeler 2021 6
  • 7. Property vs Knowledge Graphs Property Graph • Basic Node-Edge-Property • No Formal Schema • Cypher, Tinkerpop/Gremlin Knowledge Graph • Schema • Can describe real-world entities • Hierarchical Classes, Containment, Constraints, Rules, Equivalence... • Ontologies (RDF/OWL) • Incrementally Expressive • SPARQLQuery Language • Select and Construct • Reasoner-Enabled • Derives new assertions • Validates current assertions • Enables entry verification Car Truck Vehicle Type of Type of Car ID: 567 Is A Labels: Vehicle, Car Properties * Type: Honda Accord * VIN: 1234 Honda Accord Car model 1234 VIN 7 (c) John Hebeler 2021
  • 9. Applied Neo4J Graph Analytics Cypher Language Movie Search Fraud Detection Russian Trolls Machine Learning 9 (c) John Hebeler 2021
  • 12. Applied AWS Neptune Gremlin Language AWS Data Fraud Detection 12 (c) John Hebeler 2021
  • 13. Applied DASK Manage large data tasks Dask API Parallel Processing Analytics 13 (c) John Hebeler 2021
  • 14. Summary •Powerful Data Analysis •Significant Role Today •Start Exploring… Encourage Questions/Concerns: jhebeler@gmail.com 14 (c) John Hebeler 2021
  • 16. Getting Started with Neo4J • Open Source Community Version with DataScience Extensions • Neo4J: https://neo4j.com/ • DownloadContainer: https://hub.docker.com/_/neo4j $ docker pull neo4j $ docker run --publish=7474:7474 --publish=7687:7687 # can use others too - 7687 interferes with VMWare --volume=$HOME/neo4j/data:/data --volume=$HOME/neo4j/import:/import • Add GDS (https://neo4j.com/download-center/#algorithms)and APOC Methods • Load Functionality • $ cp neo4j-graph-data-science-x.x.x.jar to $NEO4J_HOME/plugins • $ cp apoc-4.2.0.1-core.jar from $NEO4J_HOME/labs to $NEO4J_HOME/plugins • Update Configuration in $NEO4J_HOME (/var/lib/neo4j)/conf/neo4j.conf • dbms.memory.heap.initial_size=512m • dbms.memory.heap.max_size=5G • dbms.security.procedures.unrestricted=gds.*,apoc.* • dbms.security.procedures.whitelist=gds.*,apoc.* • CYPHER Graph Query Language: https://neo4j.com/developer/cypher/intro-cypher/ • Import from popularfomats (csv, json, …) 16 (c) John Hebeler 2021
  • 17. Getting Started with WikiData and Sparql • All (almost) of Wikipedia as a Graph • https://www.wikidata.org/wiki/Wikidata:Main_Page • Sparql Overview • https://www.w3.org/TR/2013/REC-sparql11-overview-20130321/ • https://query.wikidata.org/ (use the query helper) • Presidental Demonstration FILTER: instance of human position held President of the United States SHOW: date of birth child spouse ADD: ORDER BY DESC (?date_of_birth) • Can export findings to common formats (csv, json,…) 17 (c) John Hebeler 2021
  • 18. Getting Started with Nvidia CudaGraph • Manages large data sets across multiple CPUs/Cores • Contains all major graph analytic libraries • Basic numeric graph data (must preprocess most data) 1 2 1 3 2 4 • Download Cuda Container • https://ngc.nvidia.com/catalog/containers/nvidia:rapidsai:rapidsai $ docker pull nvcr.io/nvidia/rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04 $ docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 nvcr.io/nvidia/rapidsai/rapidsai:21.06-cuda11.0-runtime-ubuntu18.04 • Contains working notebooks in cudagraph 18 (c) John Hebeler 2021
  • 19. Getting Started with AWS Neptune • Obtain an AWS Account • Become Familiar with TinkerPop/Gremlin • https://tinkerpop.apache.org/ • Create Neptune Database from AWS Console • Interact with SPARQL endpoint or Apache TinkerPop™ Gremlin Websockets Server 19 (c) John Hebeler 2021
  • 20. Getting Started with DASK • Manageslarge data sets across multiple CPUs/Cores • Python Library • Installwith Anaconda ordirect with pip install • Start up GUI from dask.distributed import Client client = Client(n_workers=1, threads_per_worker=4, processes=False, memory_limit='6GB') Available on port 8787 by default • ReadinLarge Data File • import dask.dataframe asdd • df= dd.read_csv(...) • df.x.sum().compute() #This uses the single-machine scheduler by default • from dask.distributed importClient • client =Client(...) #Connect todistributed cluster and override default • df.x.sum().compute() #This now runs on the distributedsystem • FilterData Set • reducedData =dataInput['user'].isin(searchList) • Simplify/Restructure the data • from urllib.parseimport urlparse • http_admin['urlbrief']=http_admin['url'].map(lambda x: urlparse(x).netloc, meta=('new_col', 'object') ) • http_admin.compute() • http_admin.head() • Drop unwantedcolumns • http_small =http_admin.drop(['url','activity', 'content'], axis=1) • #createonelargecsv filewith the listed collumns • http_small[['user','pc','urlbrief']].to_csv("http_admin_brief2.csv", single_file=True) 20 (c) John Hebeler 2021
  • 21. References • Books •Graph Databases: Ian Robinson... •Graph Algorithms: Mark Needham •Graph Analytics with Neo4j: E Scifo • Sites •neo4j.com (also https://sandbox.neo4j.com/?usecase=graph-data-science) •aws.amazon.com/neptune •tinkerpop.apache.org •wikidata.org •dask.org 21 (c) John Hebeler 2021