Optimal Decision Making - Cost Reduction in Logistics
Better Together: How Graph database enables easy data integration with Spark and Kafka in the Cloud
1. Better Together: How Graph
database enables easy data
integration with Spark and
Kafka in the Cloud
September 30th 2020
1
2. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Today's Speakers
Emma Liu
Product Manager
● BS in Engineering from Harvey Mudd College, MS
in Engineering Systems from MIT
● Prior work experience at Oracle and MarkLogic
● Focus - Cloud, Containers, Enterprise Infra,
Monitoring, Management, Connectors
Rayees Pasha
Product Manager
● MS in Computer Science from University of Memphis
● Prior Lead PM and ENG positions at Workday, Hitachi
and HP
● Expertise in Database Management and Big Data
Technologies
2
3. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
1
TigerGraph Architecture and Data
Ingestion Overview
TigerGraph and Spark Data Pipeline
TigerGraph and Kafka Data Pipeline
Today’s Outline
3
2
3
7. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data Ingestion
7
Step 3
Each GPE consumes the
partial data updates,
processes it and puts it on
disk.
Loading Jobs and POST use
UPSERT semantics:
● If vertex/edge doesn't
yet exist, create it.
● If vertex/edge already
exists, update it.
● Idempotent
Step 1
Data integration through the
following ways to ingest in
user source data.
● Bulk load of data files or
a Kafka stream in CSV or
JSON format
● HTTP POSTs via REST
services (JSON)
● GSQL Insert commands
Step 2
Dispatcher takes in the data
ingestion requests in the form of
updates to the database.
1. Query IDS to get internal
IDs
2. Convert data to internal
format
3. Send data to one or more
corresponding GPEs
8. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Data Ingestion
8
Incremental
Data
Nginx Restpp
GPE GPE GPE
Disk Disk Disk
CSV/JSON Insert/Update/Delete
Vertices and Edges
Listen to
corresponding
topic for new
messages
Acknowledge
Response
Incoming
Outgoing
Synchronize
data to disk
GSE(IDS)
ID Translation
Kafka Kafka Kafka
Server 1 Server 2 Server 3
Kafka Cluster
In-memory
copy of data
11. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Typical Spark + TigerGraph Integration
● Data Preparation and Integration (TigerGraph/Spark)
● Unsupervised Learning (TigerGraph)
● Feature Extraction for Supervised Learning (TigerGraph/Spark)
● Model Training (Spark)
● Validate and Apply Model (TigerGraph)
● Visualize and Explore Interconnected Data (TigerGraph)
11
12. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Spark and TigerGraph Data Pipeline
Static
Data
Sources
TigerGraph
JDBC
Driver
Streaming
Data
Sources
12
13. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
JDBC Driver
● Type 4 driver
● Support Read and Write bi-directional data flow to TigerGraph
● Read: Converts ResultSet to DataFrame
● Write: Load DataFrame and files to vertex/edge in TigerGraph
● Supports REST endpoints of built-in, compiled and interpreted GSQL queries from
TigerGraph
● Open Source:
● https://github.com/tigergraph/ecosys/tree/master/tools/etl/tg-jdbc-driver
13
14. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Supervised ML with TigerGraph - Detecting Phone-Based Fraud
by Analyzing Network or Graph Relationship Features at China
Mobile
Download the solution brief at - https://info.tigergraph.com/MachineLearning
14
17. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka and TigerGraph Data Pipeline
Static
Data
Sources
Streaming
Data
Sources
Kafka
Loader
17
18. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka Loader - Speed to Value from Real-time
Streaming Data
• Reduce Data Availability Gap and Accelerate Time to Value
• Native Integration with Real-time Streaming Data and Batch
Data
• Enables Real-time Graph Feature Updates with Streaming Data
in Machine Learning Use Cases
• Decrease Learning Curve With Familiar Syntax
• GSQL Support with Consistent Data Loading Syntax
• Maintain Separation of Control for Data Loading
• Designed with Built-in MultiGraph Support
18
19. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka Loader : Three Steps
Consistent with GSQL Data Loading Steps
Step 1: Define the Data Source
Step 2: Create a Loading Job
Step 3: Run the Loading Job
19
20. | GRAPHAIWORLD.COM | #GRAPHAIWORLD |
Kafka Loader High Level Architecture
● Connect to External Kafka Cluster
● User Commands Through GSQL Server
● Configuration Settings:
○ Config 1: Kakfa Cluster Configuration
○ Config 2: Topic/Partition/Offset Info
20
23. Get Started for Free
● Try TigerGraph Cloud ( tgcloud.io )
● Download TigerGraph’s Developer Edition
● Take a Test Drive - Online Demo
● Get TigerGraph Certified
● Join the Community
@TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph
23