Descrizione:
Integrating Apache Kafka with other systems in a reliable and scalable way is often a key part of an event streaming platform. In this talk we'll introduce how to use Apache Kafka (the most used Message Brocker) in combination with Neo4j through the Neo4j-Streams project, demonstrating via simple use-cases how you can leverage the information driven by the Change Data Capture Module and how to add Neo4j in your Kafka flow by using the Sink module in combination with the Neo4j Streams Procedures.
Speaker:
Andrea Santurbano - Neo4J Architect - LARUS Business Automation
Video link: https://youtu.be/oNXWOyDd5HI
2. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
WHO AM I?
Andrea
[:WORKS_AT]
[:LOVES]
[:INTEGRATOR_LEADER_FOR]
3. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
WHO’S LARUS?
LARUS BUSINESS AUTOMATION
● Founded in 2004
● Headquartered in Venice, ITALY
● Delivering services Worldwide
● Mission: “Bridging the gap between Business and IT”
#1 Solution Partner in Italy since 2013
● Creator of the Neo4j JDBC Driver
● Creator of the Neo4j Apache Zeppelin Interpreter
● Creator of the Neo4j ETL Tool
● Developed 90+ APOC
VENICE
[:BASED_IN]
4. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
INTEGRATOR LEADERS FOR NEO4J
2016
Neo4j JDBC Driver
20152011
First Spikes
in Retail for
Articles’
Clustering
2014 2018
Neo4j APOC, ETL, Spark, Zeppelin, Kafka
5. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
WE ARE HIRING!
[:HIRES]
We’re looking for PASSIONATE java DEVELOPERS
to WORK on CHALLENGING PROJECTS
with CUTTING EDGE TECHNOLOGIES (such as Kafka and Neo4j)
(in Rome and Pescara)
7. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Agenda
● What is Neo4j Streams?
○ What is Apache Kafka?
○ How we combined Neo4j and Kafka?
● The Change Data Capture Module
○ DEMO
● The Streams Procedure
○ DEMO
● The Sink
○ DEMO
9. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka?
A DISTRIBUTED STREAMING PLATFORM
Has three key capabilities:
● Publish and subscribe to streams of records;
● Store streams of records in a fault-tolerant
durable way;
● Process streams of records as they occur.
10. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka?
HOW IT WORKS?
1. TOPICS: a topic is a category or feed name to
which records are published.
2. PARTITIONS: for each topic, the Kafka cluster
maintains a partitioned log
11. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Apache Kafka?
HOW IT’S USED?
Kafka is generally used for two classes of
applications:
● Building real-time streaming data pipelines;
● Building real-time streaming applications.
12. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
What is Neo4j Streams?
Andrea
[:AUTHOR_OF][:CREATOR_OF] X
Michael
ENABLES DATA STREAM ON NEO4J
The project is a Neo4j Plugin composed of several parts:
● Neo4j Streams Change Data Capture;
● Neo4j Streams Sink;
● Neo4j Streams Procedures
We also have a Kafka Connect Plugin:
● Kafka Connect Sink plugin.
14. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Change Data Capture
Change data “what”?
In databases, Change Data Capture (CDC) is a set of software design patterns used to determine (and
track) the data that has changed so an action can be taken using the changed data.
Well suited use-cases?
● CDC solutions occur most often in data-warehouse environments;
● Allows to replicate databases without having a/much performance impact on its operation.
15. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Change Data Capture
How it works?
Each transaction communicates its changes to our event listener:
● exposing creation, updates and deletes of Nodes and Relationships
● providing before-and-after information
● configuring property filtering for each topic
Those events are sent asynchronously to Kafka, so the commit path should not be influenced by that.
18. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Streams Procedures
CONSUME/PRODUCE DATA DIRECTLY FROM CYPHER
The Neo4j Streams project comes out with two procedures:
● streams.publish: allows custom message streaming from Neo4j to the configured environment by
using the underlying configured Producer;
● streams.consume: allows consuming messages from a given topic.
21. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGEST YOUR DATA, WITH YOUR RULES
The sink provides several ways in order to ingest data from Kafka:
● Via Cypher Template
● Via CDC event published by another Neo4j Instance via the CDC module
● Via projection of a JSON event into Node/Relationship by providing an extraction pattern
22. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGEST YOUR DATA, WITH YOUR RULES
Initially, we thought about a generic consumer with a fixed projection of events into Nodes and
Relationships.
We decided that instead, we want to give the user the power to use custom Cypher statements per topic
to turn Events into arbitrary graph structures.
So you can choose by yourself what to do with a complex Kafka event. Which parts of it you want to use
for which purpose.
23. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGESTION VIA CYPHER TEMPLATE
Besides your Kafka connection information, you just add entries like this to your Neo4j config.
streams.sink.topic.cypher.<TOPIC>=<CYPHER_QUERY>
For example:
streams.sink.topic.cypher.my-topic=MERGE (n:Label {id: event.id}) ON CREATE
SET n += event.properties
Under the hood, the consumer takes a batch of Events and passes them as $batch parameter to the
Cypher statement, which we prefix with an UNWIND, so each individual entry is available as `event`
identifier to your statement. So the final statement executed by us would look like this:
UNWIND $batch AS event
MERGE (n:Label {id: event.id})
ON CREATE SET n += event.properties
24. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGESTION VIA CDC EVENT FROM ANOTHER NEO4J INSTANCE
We allow ingesting the data in two ways:
● The SourceId strategy which merges the nodes/relationships by the CDC event `id` field (it's related to
the Neo4j physical ID)
streams.sink.topic.cdc.sourceId=<TOPICS_SEPARATED_BY_SEMICOLON>
● The Schema strategy which merges the nodes/relationships by the constraints (UNIQUENESS,
NODE_KEY) defined in your graph model
streams.sink.topic.cdc.schema=<TOPICS_SEPARATED_BY_SEMICOLON>
25. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGESTION VIA JSON PROJECTION
You can extract nodes and relationships from a JSON by providing a extraction pattern.
Each property can be prefixed with:
● !: identify the id (could be more than one property), it's *mandatory*
● -: exclude the property from the extraction
● Labels can be chained via :
Tombstone Record Management
This ingestion strategy come out with the support to the Tombstone Record, in order to leverage it your
event should contain as key the record that you want to delete and `null` for the value.
26. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGESTION VIA JSON PROJECTION - NODE PATTERN EXTRACTION
Given:
{"userId": 1, "name": "Andrea", "surname": "Santurbano", "address": {"city": "Venice", "cap": "30100"}}
You can transform it into a node by specifying one of these patterns:
● User:Actor{!userId} or User:Actor{!userId,*} => (User:Actor{userId: 1, name: 'Andrea', surname:
'Santurbano', `address.city`: 'Venice', `address.cap`: 30100})
● User{!userId, surname} => (User:Actor{userId: 1, surname: 'Santurbano'})
● User{!userId, surname, address.city} => (User:Actor{userId: 1, surname: 'Santurbano', `address.city`:
'Venice'})
● User{!userId,-address} => (User:Actor{userId: 1, name: 'Andrea', surname: 'Santurbano'})
27. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
INGESTION VIA JSON PROJECTION - RELATIONSHIP PATTERN EXTRACTION
Given:
{"userId": 1, "productId": 100, "price": 10, "currency": "€", "shippingAddress": {"city": "Venice", cap: "30100"}}
You can transform it into a relationship by specifying one of these patterns:
● (User{!userId})-[:BOUGHT]->(Product{!productId}) or (User{!userId})-[:BOUGHT{price,
currency}]->(Product{!productId}) => (User{userId: 1})-[:BOUGHT{price: 10, currency: '€',
`shippingAddress.city`: 'Venice', `shippingAddress.cap`: 30100}]->(Product{productId: 100})
● (User{!userId})-[:BOUGHT{price}]->(Product{!productId}) => (User{userId: 1})-[:BOUGHT{price:
10}]->(Product{productId: 100})
28. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Sink
HOW WE MANAGE BAD DATA
The Neo4j Streams Sink module provide a Dead Letter Queue mechanism that if activated re-route all
“bad-data” to a configured topic.
What we mean for “bad-data”?
● De-Serialization errors. I.e. bad formatted JSON:
{id: 1, "name": "Andrea", "surname": "Santurbano"}
● Transient errors while ingesting data into the DB.
29. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Kafka Connect Sink
WHAT IS KAFKA CONNECT?
In open source component of Apache Kafka, is a
framework for connecting Kafka with external
systems such as databases, key-value stores,
search indexes, and file systems.
HOW IT WORKS?
It works exactly in the same way as the Neo4j Sink
plugin so you can provide for each topic your own
Cypher query.
You can download it from the Confluent HUB!
And it has the Verified GOLD badge!
31. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
RT Polyglot Persistence with Elastic, Kafka & Neo4j
32. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
RT Polyglot Persistence with Elastic, Kafka & Neo4j
33. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
Neo4j Streams: Lessons learned
THE POWER OF THE STREAM!
● We have seen how to use the CDC in order to
stream transaction events from Neo4j to other
systems;
● We have seen how to use the SINK in order to
ingest data into Neo4j by providing our own
business rules;
● We have seen how to use the Streams
PROCEDURES in order to consume/produce
data directly from Cypher.
35. LARUS Business Automation Srl Italy’s #1 Neo4j Partner
GIVE US FEEDBACK
PROVIDE US FEEDBACK
If you plan to use the Streams Plugin please give us a feedback!
https://github.com/neo4j-contrib/neo4j-streams