SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Knowledge Graph Recommendation Systems
For COVID-19
Globally, as of 23 August
2020, there have
been 23,057,288 confirmed
cases of COVID-19,
including 800,906 deaths,
reported to WHO.
https://covid19.who.int/?gclid=Cj0KCQjw6uT4BRD5ARIsADwJQ19jnAzaXO3sJguACQIGTi0oRYfs8q0KQiitpu5Sw2OOcqRppoqiyF0aAqCwEALw_wcB
Globally, as of 16 November
2020, there have
been 54,075,995 confirmed
cases of COVID-19,
including 1,313,919 deaths,
reported to WHO.
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Opening
• The tutorial provides best practices on developing COVID-19 related academic
research recommendation with the aim of the knowledge graph
• You will learn
• The CORD-19 data set and COVID-19 related knowledge graph
• Graph-based recommendation techniques and implementations
• Operationalization methods for Graph database, model, and system
• The tutorial content, materials, and source codes can be found on the website
https://kdd2020tutorial.github.io/cord19recommender/
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Environment setup
Details about the environment set up can be found at the following link
https://aka.ms/kdd2020-covid-demo
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Microsoft Academic Graph
(MAG)
Academic Knowledge
Graph
CORD-19
Metadata +
Full Text Documents
Small (0.1% node size)
Specific (limited sub-domains)
Large (500G / 500M nodes)
Comprehensive (all domains)Global Knowledge
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
• WHO
• AI2 / MSR / CZI / NIH /
Georgetown U
• WHY
• AI – fight COVID-19
• WHAT
• Meta-data
• Full text
• WHEN
• Started Mar 16, 2020
• Update daily
https://azure.microsoft.com/en-us/services/open-datasets/catalog/covid-19-open-research
https://pages.semanticscholar.org/coronavirus-research
• WHERE
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
• HOW
Reference: Wang, Lucy Lu, et al. “CORD-19: The Covid-19 Open Research Dataset.” arXiv:2004.10706, 2020.
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
• WHO
• Microsoft Academic (MSR)
• WHY
• AI – scholarly
communications
• HOW
• WHEN
• Start from 2014
• Update weekly
Project: https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/
• WHERE
Get the free graph: https://docs.microsoft.com/en-us/academic-services/graph/
• WHAT
Knowledge in the graph form
Search portal: https://academic.microsoft.com/
MAG 07-16-2020
version stats
References:
Sinha, Arnab, et al. “An Overview of Microsoft Academic Service (MAS) and Applications.” WWW, 2015.
Wang, Kuansan, et al. “A Review of Microsoft Academic Services for Science of Science Studies.” Frontiers in Big Data, 2019.
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
• WHY
• Richer Semantics
• Empower more applications
• Search
• QnA
• Content understanding / Analytics (Module 2)
• Recommendation (Module 3 + 4)
• WHAT
• ~88% CORD-19 mapped to MAG
• WHEN
• Update daily
• HOW
• Microsoft Academic Knowledge
Exploration Service (MAKES) API
Machine inference engine with
knowledge in MAG
https://academic.microsoft.com/home
MAKES References:
Wang, Kuansan, et al. “A Review of Microsoft Academic Services for Science of Science Studies.”
Frontiers in Big Data, 2019.
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
A. Get the CORD-19 to MAG paper IDs mapping: https://github.com/microsoft/mag-covid19-research-examples/blob/master/src/data/releases.md
CORD-19 data set:
https://pages.semanticscholar.org/coronavirus-research
Step-by-step instruction on using (1) PEK API for dataset linking + (2) PySpark script to get CORD-19 subgraph:
https://github.com/microsoft/mag-covid19-research-examples
Blogpost on more details of Covid-19 research on MAG [how to request MAG + PAK/MAKES access]
https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-resources-and-their-application-to-covid-19-research
C. Access the mapped CORD-19 publication via API: https://github.com/microsoft/mag-covid19-research-examples/blob/master/src/PAK-Samples/cord-
19-closure.md
B. Get the CORD-19 MAG subgraph: https://aka.ms/kdd2020-covid-demo
Option 1:
DIY
Option 2:
Grab the Results
Get Cord-19 data
(Text Corpus+Metadata)
Request MAG Access
(Knowledge Graph)
Request PAK API access
or MAKES API instance
(Indexed KG/search API)
CORD-19 to MAG ID
mapping (A)
CORD-19 MAG
Subgraph (B)
(1) PAK or MAKES
API for ID linking
(2) PySpark Script
to get subgraph
Module 2:
Content
Understanding
Module 3:
KG-assisted
Recommendation
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Temporal
Publishing Venue
Geo
Team Size
Topic
CORD-19
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Knowledge Graph size comparison
MAG CORD-19 Subgraph
(Version: CORD-19 06/14/2020; MAG 06/05/2020)
Nodes: ~623K
Edges: ~2M
Full Microsoft Academic Graph
(MAG)
Nodes: ~480M
Edges: ~4B
Small: 0.1% node size
Specific: 7.8% topics
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
CORD-19 Topic Distribution
paper nodes in MAG
Top Domains Top Sub-domains
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
CORD-19 Publishing Venue Distribution
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
CORD-19 Geo & Temporal Distribution
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
CORD-19 Team Size Distribution
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Applications of Knowledge Graph:
Academic Recommender Systems
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Academic recommendation with knowledge graph
You May Cite:
author-to-paper recommendation
More Like This:
paper-to-paper recommendation
…
Recommendations RecommendationsCitation
history
Current
paper
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
source:
https://github.com/microsoft/recommenders/blob
/master/examples/01_prepare_data/wikidata_kno
wledge_graph.ipynb
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Microsoft Recommenders
https://github.com/microsoft/recommenders
• Helping researchers and
developers to quickly select,
prototype, demonstrate, and
productionize a recommender
system
• Accelerating enterprise-grade
development and deployment
of a recommender system into
production
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Elon Musk offers Tesla Model 3
sneak peek …
Google fumbles while Tesla sprints
toward a driverless future …
Trump pledges aid to Silicon Valley
during tech meeting …
Click history Knowledge graph
General Motors is ramping up its self-driving car:
Ford should be nervous …
Will the user click it?
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
User’s clicked news
Candidate news
Attention Net
KCNN KCNN KCNN KCNN
concat. User
embedding
Candidate
news
embedding
Click probability element-wise +
element-wise ×
DKN
Attention net:
User interest extraction:
CTR prediction:
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Context of entities
“Fight Club”
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
𝒅 × 𝒏 word
embeddings
𝒅 × 𝒏 transformed
entity embeddings
𝒅 × 𝒏 transformed
context embeddings
CNN layer
pooling
multiple
channels
𝑤1:𝑛 = [Galaxy Note 20 will launch Aug. 5]
𝑒1:𝑛 = [ 1 1 1 0 0 0]
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
source: https://aka.ms/kdd2020-covid-demo
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Operationalization of KG-based
recommender system
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
37
• Customer experience
• Targeted market campaign
• Digital assistance
• Anti money laundering
• Fraud detection
• “Know-Your-Client”
• Wikipedia
• Academic article search
• Search engine
• Recommendation
• Customer segmentation
Retail Public good
Finance Internet
Key areas of
application
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Industrial applications on graph
Organization Product Category Application Use case Application
Google Google Knowledgh Graph Commercial Representation Data search Enterprise
Microsoft Satori Commercial Representation Data search (internal use) Enterprise
Wikipedia Wikidata Open-source Representation Knowledge base for Wikipedia data Enterprise
Leipzig
University/University of
Mannheim
DBPedia Open-source Representation
Semantically query relationships of
entities extracted from Wikipedia
Enterprise
Lenovo HyperGraph
Commercial
Representation/Learning Knowledge graph-based analytics Enterprise
Facebook Graph Search Commercial Representation Semantic search engine Consumer
LinkedIn LinkedIn Knowledge Graph Commercial Representation
Knowledge graph built on LinkedIn
entities
Enterprise/Consumer
Amazon Product Graph Commercial Representation Knowledge graph for Amazon products Enterprise
Pinterest PinSage Commercial Learning GCN based recommendation system Consumer
Uber - Commercial Representation/Learning UberEats knowledge graph Consumer
Airbnb - Commercial Representation/Learning Contextualization Consumer
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Graph database
• Linear growth of data entries result in exponential growth of data connections
• Graph database promises higher flexibility compared to tabular database
https://neo4j.com/developer/graph-db-vs-rdbms/
SELECT name FROM Person
LEFT JOIN Person_Department
ON Person.Id = Person_Department.PersonId
LEFT JOIN Department
ON Department.Id = Person_Department.DepartmentId
WHERE Department.name = "IT Department"
MATCH (p:Person)-[:WORKS_AT]->(d:Dept)
WHERE d.name = "IT Department"
RETURN p.nameExample graph that presents the
connection of people in Twitter
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
SQL query
Graph query
Query language Supported platform Technical features
Cypher Neo4j, AgensGraph, and RedisGraph
• Declarative graph query language
• It is based on the property graph model
that organizes data into nodes and edges
Gremlin
Janus Graph, InifniteGraph, Azure Comos
DB, DataStax Enterprise, and AWS
Neptune
• Graph traversal language
• Supports imperative and declarative
querying, host language agnosticism, etc.
SPARQL RDF4J, Jena, OpenLink Virtuoso
• SPARQL Protocol and RDF Query Language
• RDF query language which is semantic
Query language platform
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Real-time scoring in recommendation
scenarios
Option 1: real-time look-up of pre-scored recommendation
Option 2: real-time scoring based on approximate similarity matching
Option 3: real-time scoring with optimization tricks
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Real-time look up of pre-scored
recommendations
API running on
Kubernetes
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Real-time scoring based on approximate
similarity matching
Steps:
1. Compute user/item embeddings using any ML/DL algorithm.
2. Build the similarity index using KD-Trees, Locality-Sensitive Hashing (LSH) or similar.
3. Upload the similarity index to machine or instantiate directly the index in memory.
4. Generate an AKS API for scoring.
5. Given a user/item query, compute the query embedding.
6. With the approximate similarity model, find embeddings that are similar to the query
embedding.
7. Retrieve the IDs of the similar objects.
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Real-time scoring with optimization tricks
Optimization tricks:
i) request batching which merges adjacent requests from CPU to take
advantage of GPU power
ii) GPU memory optimization which improves the access pattern to reduce
wasted transactions in GPU memory
iii) Concurrent kernel computation which allows execution of matrix
computations to be processed with multiple CUDA kernels concurrently.
source: http://arxiv.org/abs/1706.06978
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Key takeaways
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
References
1. Hongwei Wang et al, “DKN: Deep Knowledge-Aware Network for News Recommendation”, ACM WWW 2018.
2. Andreas Argyriou, Miguel González-Fierro, and Le Zhang, “Microsoft Recommenders: Best Practices for Production-Ready Recommendation
Systems”, ACM WWW 2020.
3. Shen, Zhihong, and Dong, Yuxiao. From Graph to Knowledge Graph Algorithms and Applications. edX, 2019. https://www.edx.org/course/graphsto-
knowledgegraphs-3
4. Tang, Jie, and Dong, Yuxiao. Representation Learning on Networks: Theories, Algorithms, and Applications. WWW, 2019.
https://www2019.thewebconf.org/tutorials.
5. Le Zhang et al, “Building production-ready recommendation systems at scale”, KDD 2019.
6. Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. “Deepwalk: Online learning of social representations.” Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and data mining. 2014.
7. Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.” Advances in neural information processing
systems. 2017.
8. CORD-19, url: https://pages.semanticscholar.org/coronavirus-research
9. Microsoft Academic Graph Documentation, url: https://docs.microsoft.com/en-us/academic-services/graph/
10. Microsoft Academic Graph Application on COVID-19 research, url: https://github.com/microsoft/mag-covid19-research-examples
11. Understanding Documentation By Using Semantics, url: https://www.microsoft.com/en-us/research/project/academic/articles/understanding-
documents-by-using-semantics/
12. Wang, Kuansan, et al. “Microsoft Academic Graph: When Experts Are Not Enough.” Quantitative Science Studies, vol. 1, no. 1, 2020, pp. 396–413.
13. Wang, Kuansan, et al. “A Review of Microsoft Academic Services for Science of Science Studies.” Frontiers in Big Data, vol. 2, 2019.
14. L. Zhang, Z. Shen, J. Lian, C. Wu, M. González-Fierro, A. Argyriou, and T. Wu, "In Search for A Cure: Recommendation with Knowledge Graph on
CORD-19", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2020 (KDD 2020), 2020. Available online:
https://dl.acm.org/doi/10.1145/3394486.3406711
Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
Q & A

Contenu connexe

Similaire à Knowledge Graph Recommendation Systems For COVID-19

Similaire à Knowledge Graph Recommendation Systems For COVID-19 (20)

3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
 
Testable apps on steroids @ ns barcelona
Testable apps on steroids @ ns barcelonaTestable apps on steroids @ ns barcelona
Testable apps on steroids @ ns barcelona
 
CALL FOR PAPERS - International Conference on Data Science and Applications (...
CALL FOR PAPERS - International Conference on Data Science and Applications (...CALL FOR PAPERS - International Conference on Data Science and Applications (...
CALL FOR PAPERS - International Conference on Data Science and Applications (...
 
G20 “Digital Economy” Task Force Meeting - Andrew Wyckoff
G20 “Digital Economy”  Task Force Meeting - Andrew WyckoffG20 “Digital Economy”  Task Force Meeting - Andrew Wyckoff
G20 “Digital Economy” Task Force Meeting - Andrew Wyckoff
 
International Conference on Data Mining and Machine Learning (DMML 2020)
International Conference on Data Mining and Machine Learning (DMML 2020)International Conference on Data Mining and Machine Learning (DMML 2020)
International Conference on Data Mining and Machine Learning (DMML 2020)
 
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)3 rd International Conference on Data Science and Machine Learning (DSML 2022)
3 rd International Conference on Data Science and Machine Learning (DSML 2022)
 
Call for Papers! International Conference on Cloud and Big Data (CLBD 2020)
Call for Papers!  International Conference on Cloud and Big Data (CLBD 2020)Call for Papers!  International Conference on Cloud and Big Data (CLBD 2020)
Call for Papers! International Conference on Cloud and Big Data (CLBD 2020)
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
 
Call for Paper - 3rd International conference on Big Data, Machine learning a...
Call for Paper - 3rd International conference on Big Data, Machine learning a...Call for Paper - 3rd International conference on Big Data, Machine learning a...
Call for Paper - 3rd International conference on Big Data, Machine learning a...
 
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassMachine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
 
4 th International Conference on Big Data (CBDA 2023)
4 th International Conference on Big Data (CBDA 2023)4 th International Conference on Big Data (CBDA 2023)
4 th International Conference on Big Data (CBDA 2023)
 
4th International Conference on Big Data (CBDA 2023)
4th International Conference on Big Data (CBDA 2023)4th International Conference on Big Data (CBDA 2023)
4th International Conference on Big Data (CBDA 2023)
 
Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...
Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...
Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...
 
Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...
Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...
Call for Papers - International Conference on Cloud, Big Data and IoT (CBIoT ...
 
9th International Conference on Advanced Information Technologies and Applica...
9th International Conference on Advanced Information Technologies and Applica...9th International Conference on Advanced Information Technologies and Applica...
9th International Conference on Advanced Information Technologies and Applica...
 
International Conference on Cloud, Big Data and IoT (CBIoT 2020)
International Conference on Cloud, Big Data and IoT (CBIoT 2020)International Conference on Cloud, Big Data and IoT (CBIoT 2020)
International Conference on Cloud, Big Data and IoT (CBIoT 2020)
 
The Future of Higher Education in the United States
The Future of Higher Education in the United StatesThe Future of Higher Education in the United States
The Future of Higher Education in the United States
 
9th international conference on advanced information technologies and applica...
9th international conference on advanced information technologies and applica...9th international conference on advanced information technologies and applica...
9th international conference on advanced information technologies and applica...
 
3rdInternational Conference on Cloud, Big Data and IoT (CBIoT 2022)
3rdInternational  Conference  on  Cloud,  Big  Data  and  IoT  (CBIoT  2022)3rdInternational  Conference  on  Cloud,  Big  Data  and  IoT  (CBIoT  2022)
3rdInternational Conference on Cloud, Big Data and IoT (CBIoT 2022)
 

Plus de Miguel González-Fierro

Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Miguel González-Fierro
 
Distributed training of Deep Learning Models
Distributed training of Deep Learning ModelsDistributed training of Deep Learning Models
Distributed training of Deep Learning Models
Miguel González-Fierro
 
Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...
Miguel González-Fierro
 

Plus de Miguel González-Fierro (12)

Los retos de la inteligencia artificial en la sociedad actual
Los retos de la inteligencia artificial en la sociedad actualLos retos de la inteligencia artificial en la sociedad actual
Los retos de la inteligencia artificial en la sociedad actual
 
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
 
Best practices in coding for beginners
Best practices in coding for beginnersBest practices in coding for beginners
Best practices in coding for beginners
 
Distributed training of Deep Learning Models
Distributed training of Deep Learning ModelsDistributed training of Deep Learning Models
Distributed training of Deep Learning Models
 
Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...Running Intelligent Applications inside a Database: Deep Learning with Python...
Running Intelligent Applications inside a Database: Deep Learning with Python...
 
Deep Learning for Sales Professionals
Deep Learning for Sales ProfessionalsDeep Learning for Sales Professionals
Deep Learning for Sales Professionals
 
Deep Learning for Lung Cancer Detection
Deep Learning for Lung Cancer DetectionDeep Learning for Lung Cancer Detection
Deep Learning for Lung Cancer Detection
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep Learning
 
Speeding up machine-learning applications with the LightGBM library
Speeding up machine-learning applications with the LightGBM librarySpeeding up machine-learning applications with the LightGBM library
Speeding up machine-learning applications with the LightGBM library
 
Leveraging Data Driven Research Through Microsoft Azure
Leveraging Data Driven Research Through Microsoft AzureLeveraging Data Driven Research Through Microsoft Azure
Leveraging Data Driven Research Through Microsoft Azure
 
Empowering every person on the planet to achieve more
Empowering every person on the planet to achieve moreEmpowering every person on the planet to achieve more
Empowering every person on the planet to achieve more
 
Deep Learning for NLP
Deep Learning for NLP Deep Learning for NLP
Deep Learning for NLP
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 

Knowledge Graph Recommendation Systems For COVID-19

  • 1. Knowledge Graph Recommendation Systems For COVID-19
  • 2. Globally, as of 23 August 2020, there have been 23,057,288 confirmed cases of COVID-19, including 800,906 deaths, reported to WHO. https://covid19.who.int/?gclid=Cj0KCQjw6uT4BRD5ARIsADwJQ19jnAzaXO3sJguACQIGTi0oRYfs8q0KQiitpu5Sw2OOcqRppoqiyF0aAqCwEALw_wcB Globally, as of 16 November 2020, there have been 54,075,995 confirmed cases of COVID-19, including 1,313,919 deaths, reported to WHO. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 3. Opening • The tutorial provides best practices on developing COVID-19 related academic research recommendation with the aim of the knowledge graph • You will learn • The CORD-19 data set and COVID-19 related knowledge graph • Graph-based recommendation techniques and implementations • Operationalization methods for Graph database, model, and system • The tutorial content, materials, and source codes can be found on the website https://kdd2020tutorial.github.io/cord19recommender/ Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 4. Environment setup Details about the environment set up can be found at the following link https://aka.ms/kdd2020-covid-demo Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 5.
  • 6. Microsoft Academic Graph (MAG) Academic Knowledge Graph CORD-19 Metadata + Full Text Documents Small (0.1% node size) Specific (limited sub-domains) Large (500G / 500M nodes) Comprehensive (all domains)Global Knowledge Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 7. • WHO • AI2 / MSR / CZI / NIH / Georgetown U • WHY • AI – fight COVID-19 • WHAT • Meta-data • Full text • WHEN • Started Mar 16, 2020 • Update daily https://azure.microsoft.com/en-us/services/open-datasets/catalog/covid-19-open-research https://pages.semanticscholar.org/coronavirus-research • WHERE https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge • HOW Reference: Wang, Lucy Lu, et al. “CORD-19: The Covid-19 Open Research Dataset.” arXiv:2004.10706, 2020. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 8. • WHO • Microsoft Academic (MSR) • WHY • AI – scholarly communications • HOW • WHEN • Start from 2014 • Update weekly Project: https://www.microsoft.com/en-us/research/project/microsoft-academic-graph/ • WHERE Get the free graph: https://docs.microsoft.com/en-us/academic-services/graph/ • WHAT Knowledge in the graph form Search portal: https://academic.microsoft.com/ MAG 07-16-2020 version stats References: Sinha, Arnab, et al. “An Overview of Microsoft Academic Service (MAS) and Applications.” WWW, 2015. Wang, Kuansan, et al. “A Review of Microsoft Academic Services for Science of Science Studies.” Frontiers in Big Data, 2019. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 9. • WHY • Richer Semantics • Empower more applications • Search • QnA • Content understanding / Analytics (Module 2) • Recommendation (Module 3 + 4) • WHAT • ~88% CORD-19 mapped to MAG • WHEN • Update daily • HOW • Microsoft Academic Knowledge Exploration Service (MAKES) API Machine inference engine with knowledge in MAG https://academic.microsoft.com/home MAKES References: Wang, Kuansan, et al. “A Review of Microsoft Academic Services for Science of Science Studies.” Frontiers in Big Data, 2019. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 10. A. Get the CORD-19 to MAG paper IDs mapping: https://github.com/microsoft/mag-covid19-research-examples/blob/master/src/data/releases.md CORD-19 data set: https://pages.semanticscholar.org/coronavirus-research Step-by-step instruction on using (1) PEK API for dataset linking + (2) PySpark script to get CORD-19 subgraph: https://github.com/microsoft/mag-covid19-research-examples Blogpost on more details of Covid-19 research on MAG [how to request MAG + PAK/MAKES access] https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-resources-and-their-application-to-covid-19-research C. Access the mapped CORD-19 publication via API: https://github.com/microsoft/mag-covid19-research-examples/blob/master/src/PAK-Samples/cord- 19-closure.md B. Get the CORD-19 MAG subgraph: https://aka.ms/kdd2020-covid-demo Option 1: DIY Option 2: Grab the Results Get Cord-19 data (Text Corpus+Metadata) Request MAG Access (Knowledge Graph) Request PAK API access or MAKES API instance (Indexed KG/search API) CORD-19 to MAG ID mapping (A) CORD-19 MAG Subgraph (B) (1) PAK or MAKES API for ID linking (2) PySpark Script to get subgraph Module 2: Content Understanding Module 3: KG-assisted Recommendation Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 11.
  • 12. Temporal Publishing Venue Geo Team Size Topic CORD-19 Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 13. Knowledge Graph size comparison MAG CORD-19 Subgraph (Version: CORD-19 06/14/2020; MAG 06/05/2020) Nodes: ~623K Edges: ~2M Full Microsoft Academic Graph (MAG) Nodes: ~480M Edges: ~4B Small: 0.1% node size Specific: 7.8% topics Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 14. CORD-19 Topic Distribution paper nodes in MAG Top Domains Top Sub-domains Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 15. CORD-19 Publishing Venue Distribution Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 16. CORD-19 Geo & Temporal Distribution Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 17. CORD-19 Team Size Distribution Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 18. Applications of Knowledge Graph: Academic Recommender Systems
  • 19. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 20. Academic recommendation with knowledge graph You May Cite: author-to-paper recommendation More Like This: paper-to-paper recommendation … Recommendations RecommendationsCitation history Current paper Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 22. Microsoft Recommenders https://github.com/microsoft/recommenders • Helping researchers and developers to quickly select, prototype, demonstrate, and productionize a recommender system • Accelerating enterprise-grade development and deployment of a recommender system into production Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 23. Elon Musk offers Tesla Model 3 sneak peek … Google fumbles while Tesla sprints toward a driverless future … Trump pledges aid to Silicon Valley during tech meeting … Click history Knowledge graph General Motors is ramping up its self-driving car: Ford should be nervous … Will the user click it? Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 24. User’s clicked news Candidate news Attention Net KCNN KCNN KCNN KCNN concat. User embedding Candidate news embedding Click probability element-wise + element-wise × DKN Attention net: User interest extraction: CTR prediction: Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 25. Context of entities “Fight Club” Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 26. 𝒅 × 𝒏 word embeddings 𝒅 × 𝒏 transformed entity embeddings 𝒅 × 𝒏 transformed context embeddings CNN layer pooling multiple channels 𝑤1:𝑛 = [Galaxy Note 20 will launch Aug. 5] 𝑒1:𝑛 = [ 1 1 1 0 0 0] Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 27. source: https://aka.ms/kdd2020-covid-demo Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 29. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 30. 37 • Customer experience • Targeted market campaign • Digital assistance • Anti money laundering • Fraud detection • “Know-Your-Client” • Wikipedia • Academic article search • Search engine • Recommendation • Customer segmentation Retail Public good Finance Internet Key areas of application Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 31. Industrial applications on graph Organization Product Category Application Use case Application Google Google Knowledgh Graph Commercial Representation Data search Enterprise Microsoft Satori Commercial Representation Data search (internal use) Enterprise Wikipedia Wikidata Open-source Representation Knowledge base for Wikipedia data Enterprise Leipzig University/University of Mannheim DBPedia Open-source Representation Semantically query relationships of entities extracted from Wikipedia Enterprise Lenovo HyperGraph Commercial Representation/Learning Knowledge graph-based analytics Enterprise Facebook Graph Search Commercial Representation Semantic search engine Consumer LinkedIn LinkedIn Knowledge Graph Commercial Representation Knowledge graph built on LinkedIn entities Enterprise/Consumer Amazon Product Graph Commercial Representation Knowledge graph for Amazon products Enterprise Pinterest PinSage Commercial Learning GCN based recommendation system Consumer Uber - Commercial Representation/Learning UberEats knowledge graph Consumer Airbnb - Commercial Representation/Learning Contextualization Consumer Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 32. Graph database • Linear growth of data entries result in exponential growth of data connections • Graph database promises higher flexibility compared to tabular database https://neo4j.com/developer/graph-db-vs-rdbms/ SELECT name FROM Person LEFT JOIN Person_Department ON Person.Id = Person_Department.PersonId LEFT JOIN Department ON Department.Id = Person_Department.DepartmentId WHERE Department.name = "IT Department" MATCH (p:Person)-[:WORKS_AT]->(d:Dept) WHERE d.name = "IT Department" RETURN p.nameExample graph that presents the connection of people in Twitter Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro SQL query Graph query
  • 33. Query language Supported platform Technical features Cypher Neo4j, AgensGraph, and RedisGraph • Declarative graph query language • It is based on the property graph model that organizes data into nodes and edges Gremlin Janus Graph, InifniteGraph, Azure Comos DB, DataStax Enterprise, and AWS Neptune • Graph traversal language • Supports imperative and declarative querying, host language agnosticism, etc. SPARQL RDF4J, Jena, OpenLink Virtuoso • SPARQL Protocol and RDF Query Language • RDF query language which is semantic Query language platform Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 34. Real-time scoring in recommendation scenarios Option 1: real-time look-up of pre-scored recommendation Option 2: real-time scoring based on approximate similarity matching Option 3: real-time scoring with optimization tricks Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 35. Real-time look up of pre-scored recommendations API running on Kubernetes Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 36. Real-time scoring based on approximate similarity matching Steps: 1. Compute user/item embeddings using any ML/DL algorithm. 2. Build the similarity index using KD-Trees, Locality-Sensitive Hashing (LSH) or similar. 3. Upload the similarity index to machine or instantiate directly the index in memory. 4. Generate an AKS API for scoring. 5. Given a user/item query, compute the query embedding. 6. With the approximate similarity model, find embeddings that are similar to the query embedding. 7. Retrieve the IDs of the similar objects. Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 37. Real-time scoring with optimization tricks Optimization tricks: i) request batching which merges adjacent requests from CPU to take advantage of GPU power ii) GPU memory optimization which improves the access pattern to reduce wasted transactions in GPU memory iii) Concurrent kernel computation which allows execution of matrix computations to be processed with multiple CUDA kernels concurrently. source: http://arxiv.org/abs/1706.06978 Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 38.
  • 39. Key takeaways Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 40. References 1. Hongwei Wang et al, “DKN: Deep Knowledge-Aware Network for News Recommendation”, ACM WWW 2018. 2. Andreas Argyriou, Miguel González-Fierro, and Le Zhang, “Microsoft Recommenders: Best Practices for Production-Ready Recommendation Systems”, ACM WWW 2020. 3. Shen, Zhihong, and Dong, Yuxiao. From Graph to Knowledge Graph Algorithms and Applications. edX, 2019. https://www.edx.org/course/graphsto- knowledgegraphs-3 4. Tang, Jie, and Dong, Yuxiao. Representation Learning on Networks: Theories, Algorithms, and Applications. WWW, 2019. https://www2019.thewebconf.org/tutorials. 5. Le Zhang et al, “Building production-ready recommendation systems at scale”, KDD 2019. 6. Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. “Deepwalk: Online learning of social representations.” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 2014. 7. Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.” Advances in neural information processing systems. 2017. 8. CORD-19, url: https://pages.semanticscholar.org/coronavirus-research 9. Microsoft Academic Graph Documentation, url: https://docs.microsoft.com/en-us/academic-services/graph/ 10. Microsoft Academic Graph Application on COVID-19 research, url: https://github.com/microsoft/mag-covid19-research-examples 11. Understanding Documentation By Using Semantics, url: https://www.microsoft.com/en-us/research/project/academic/articles/understanding- documents-by-using-semantics/ 12. Wang, Kuansan, et al. “Microsoft Academic Graph: When Experts Are Not Enough.” Quantitative Science Studies, vol. 1, no. 1, 2020, pp. 396–413. 13. Wang, Kuansan, et al. “A Review of Microsoft Academic Services for Science of Science Studies.” Frontiers in Big Data, vol. 2, 2019. 14. L. Zhang, Z. Shen, J. Lian, C. Wu, M. González-Fierro, A. Argyriou, and T. Wu, "In Search for A Cure: Recommendation with Knowledge Graph on CORD-19", ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2020 (KDD 2020), 2020. Available online: https://dl.acm.org/doi/10.1145/3394486.3406711 Toronto Machine Learning Summit Nov 16th 2020 | Miguel González-Fierro @miguelgfierro /in/miguelgfierro
  • 41. Q & A