SlideShare a Scribd company logo
1 of 26
Download to read offline
Building a Dynamic DSL Rules
Engine with Kafka Streams
Will LaForest
Field CTO
Michael Peacock
Field Engineer
2
Why Something Other Than These
FRAMEWORKS
ksqlDB
Stream Processing Frameworks General Purpose
3
Stream processing very powerful
BUT
● Uses GPLs (Java, Python) or SQL
● 99.2% of workforce are NOT
coders
● Maybe SQL is easier
○ Still general purpose
○ Impractical for some
domains
99% 1%
• Volume of rules high? 100s or
1000s?
• If each rule is a stream
processing job
• Each a consumer and
producer
• Same network IO 1000x
• Process overhead
• Serialization/Deserialization
Technical Challenges
4
Cartoon style. Gnome sees something gross and vomits
Technical Challenges
5
• Multiple dynamic output
topics?
• Change rate of ruleset high?
• Minimize time to to
processing?
• Must be dynamic
• Don’t compile and deploy
• Don’t create a new stream
processor
• Don’t restart topology if
possible
Anyone Know what this is?
6
to spiral
make "n 1
while [:n < 100] [
make "n :n + 5
fd :n rt 90
]
end
Domain Specific Languages
Purpose built for a domain
● HTML web pages
● Regex for patterns
● MATLAB for numerical analysis
● GraphQL - Querying APIs
● Sigma for cyber and log data
If serialized as JSON, YAML, XML
● DSL as Data
● Apply data oriented operations
● Validate, secure, govern as data
● Can be transformed easily
● Generated from domain specific tool
● Dynamic
The Cybersecurity Domain
A Typical SIEM Platform Architecture
Agents /
Collectors
Reports and
alerts
Search and
investigate
Monitor and
analyze
SIEM
Tool
Network traffic
Firewall logs
RDBMS
Application logs
HTTP proxy logs
Forensic
Archive
AI/ML
Architecture with Kafka
Curate
Real-time Detection
APP SIEM
Index
SOAR
HDFS
S3
Big Query
Syslog
Network traffic
Firewall logs
RDBMS
Application logs
HTTP proxy logs
QRadar
Arcsight
Splunk
Machine Data
spooldir (files), SNMP Traps,
Databases, Sftp, MQs, S2S
Elastic
Confluent Sigma
https://github.com/confluentinc/confluent-sigma
What is Sigma?
12
https://github.com/SigmaHQ/sigma
● Sigma is an open signature (patterns) format
● Focused on log and network
● Researchers or analysts can describe developed
detection patterns
● Shareable with others (IMPORTANT)
● Rules schema agnostic
● Platform agnostic
○ Same rule can be be used for
○ Elastic, Splunk, Azure Sentinel, (now Kafka)
Confluent Sigma
13
Sigma Rules
(YAML)
Source
Data
Sigma Stream
Processor
SIEM
Applications
Cold storage
High Risk Detections
SOAR
Filtered
Cold retention
Data
Rules
Why Kafka Streams API?
14
● Simple to build a topology
● Easy to run as app
● No required runtime
● Start small easily
App 1
● Scale massively
● All you need is Kafka
● Can run anywhere!
A Walkthrough of the Sigma
Project
A Walkthrough of the Sigma Project
16
sigma-parser
sigma-streams
sigma-streams-ui
https://github.com/confluentinc/confluent-sigma
Sigma Stream Topologies
Simple Topology
supports one
sub-topology to
many rules
Aggregate Topology
requires one
sub-topology for
each rule
Simple Topology
18
flatMapValues KStream
iterate through each rule for the
processor (product/service
filtering)
validates the streaming data
against the DSL rule and add the
results to the output list if a
match is found
optional config variable to return
after the first match or return all
matches
dynamic output topic flatMapValues
Create a new KStream by transforming the value of
each record in this stream into zero or more
values with the same key in the new stream.
{
"@stream": "dns",
"@system": "bobs.bigwheel.local",
"@proc": "zeek",
"ts": 1588205199.82437,
"uid": "Cvf4XX17hSAgXDdGEd",
"id_orig_h": "10.0.1.6",
"id_orig_p": 54243,
"id_resp_h": "10.0.0.4",
"id_resp_p": 53,
"proto": "udp",
"trans_id": 41180,
"rtt": 0.001528024673461914,
"query": "newyork.dmevals.local",
"qclass": 1,
"qclass_name": "C_INTERNET",
"qtype": 1,
Sigma Rule
Streaming Data
1
2
3
4
5
Detection Results / Dynamic Output Topic
19
'^(?<timestamp>w{3}sd{2}sd{2}:d{2}:d{2})s(?< hostname>[^s]+)
s%ASA-d-(?< messageID>[^:]+):s(?< action>[^s]+)s(?< protocol>[^s]+)
ssrcsinside:(?< src>[0-9.]+)/(?< srcport>[0-9]+)sdstsoutside:(?< d
est>[0-9.]+)/(?< destport>[0-9]+)'
Sigma Rule
firewalls topic
adds the record data and rule
metadata
if the rule contains a regex with
mapping, adds the mapped
fields
adds any custom fields
dynamic output topic
1
2
3
4
Aggregate Topology
20
iterate through each rule for the
processor (product/service
filtering)
validates the streaming data
against the DSL rule
groups by the key
sliding window from rule
counts number of instances
within the window
filters based on aggregation and
operation in rule
dynamic output topic
{
"@stream": "dns",
"@system":
"bobs.bigwheel.local",
"@proc": "zeek",
"ts": 1588205199.82437,
"uid": "Cvf4XX17hSAgXDdGEd",
"id_orig_h": "10.0.1.6",
"id_orig_p": 54243,
"id_resp_h": "10.0.0.4",
"id_resp_p": 53,
"proto": "udp",
"trans_id": 41180,
"rtt": 0.001528024673461914,
"query":
"newyork.dmevals.local",
"qclass": 1,
"qclass_name": "C_INTERNET",
"qtype": 1,
Sigma Rule
Streaming Data
1
3
5
6
7
4
2
Confluent Sigma Demo
Sigma Stream Processors
Sigma Streams UI
Sigma Rule Editor
sigma rules topic
DNS
dns
detections
topic
dns topic
rule parsing,
filtering,
aggregation,
windowing
sigma
rules
cache
CONN
DHCP
HTTP
SSL
x509
Zeek Data
Demo
Considerations for DSL Stream Processing
● Consider how to handle newly arriving rules
○ Apply only to unprocessed messages
○ What about previous data?
■ Spawn separate app or topology
■ When “caught up” merge into single
● Versioned state stores now allow new strategy
○ Only join rules available for corresponding record time
○ For records only join against rules that existed at record time
■ Upcoming KIP-960, KIP-968, KIP-969 for our customized joins
23
https://www.confluent.io/blog/introducing-versioned-state-store-in-kafka-streams/
Considerations Continued
● Consider how routines may interact with each other
○ Sigma doesn’t support notion of priority or dependence
● For a DSL in which rules interact (IFTTT)
○ Use Processor API to map to an appropriate DAG
24
Whats Next?
“Roadmap” https://github.com/confluentinc/confluent-sigma/projects/1
● Refactoring framework to make DSL pluggable (on going)
○ Implement a builder class: DSL routines -> Kafka Streams topology
○ Re-use all the scaffolding we have built
○ LLM stream processing
● Optimizations
○ Aggregate topology optimizations (bin based on time and key)
○ More graceful management of aggregate topology changes
■ Self coordination
○ General profiling and optimization
● Switch to GlobalKTable from KCache 25
Building a Dynamic Rules Engine with Kafka Streams

More Related Content

What's hot

Designing Secure Cisco Data Centers
Designing Secure Cisco Data CentersDesigning Secure Cisco Data Centers
Designing Secure Cisco Data CentersCisco Russia
 
The Inside Story: How OPC UA and DDS Can Work Together in Industrial Systems
The Inside Story: How OPC UA and DDS Can Work Together in Industrial SystemsThe Inside Story: How OPC UA and DDS Can Work Together in Industrial Systems
The Inside Story: How OPC UA and DDS Can Work Together in Industrial SystemsReal-Time Innovations (RTI)
 
AWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesAWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesTobyWilman
 
Piwik - open source web analytics
Piwik - open source web analyticsPiwik - open source web analytics
Piwik - open source web analyticsMatthieu Aubry
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputerinside-BigData.com
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersSplunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersHarry McLaren
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...confluent
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson PlatformHANCOM MDS
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA accelerationMarco77328
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 

What's hot (20)

Designing Secure Cisco Data Centers
Designing Secure Cisco Data CentersDesigning Secure Cisco Data Centers
Designing Secure Cisco Data Centers
 
Splunk-Presentation
Splunk-Presentation Splunk-Presentation
Splunk-Presentation
 
The Inside Story: How OPC UA and DDS Can Work Together in Industrial Systems
The Inside Story: How OPC UA and DDS Can Work Together in Industrial SystemsThe Inside Story: How OPC UA and DDS Can Work Together in Industrial Systems
The Inside Story: How OPC UA and DDS Can Work Together in Industrial Systems
 
GPU
GPUGPU
GPU
 
AWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - SlidesAWS vs Azure vs Google (GCP) - Slides
AWS vs Azure vs Google (GCP) - Slides
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
 
Piwik - open source web analytics
Piwik - open source web analyticsPiwik - open source web analytics
Piwik - open source web analytics
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
The Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K SupercomputerThe Tofu Interconnect D for the Post K Supercomputer
The Tofu Interconnect D for the Post K Supercomputer
 
Splunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy ForwardersSplunk Dashboarding & Universal Vs. Heavy Forwarders
Splunk Dashboarding & Universal Vs. Heavy Forwarders
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform한컴MDS_NVIDIA Jetson Platform
한컴MDS_NVIDIA Jetson Platform
 
Cloudflare Access
Cloudflare AccessCloudflare Access
Cloudflare Access
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Zero-Trust SASE DevSecOps
Zero-Trust SASE DevSecOpsZero-Trust SASE DevSecOps
Zero-Trust SASE DevSecOps
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 

Similar to Building a Dynamic Rules Engine with Kafka Streams

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application Apache Apex
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksMatthias Niehoff
 
Cisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven TelemetryCisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven TelemetryCisco Canada
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKData Con LA
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202Timothy Spann
 
Hyperledger 구조 분석
Hyperledger 구조 분석Hyperledger 구조 분석
Hyperledger 구조 분석Jongseok Choi
 
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...FIWARE
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Eugene
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 

Similar to Building a Dynamic Rules Engine with Kafka Streams (20)

Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
Cisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven TelemetryCisco Connect Toronto 2017 - Model-driven Telemetry
Cisco Connect Toronto 2017 - Model-driven Telemetry
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202WarsawITDays_ ApacheNiFi202
WarsawITDays_ ApacheNiFi202
 
Hyperledger 구조 분석
Hyperledger 구조 분석Hyperledger 구조 분석
Hyperledger 구조 분석
 
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
FIWARE Global Summit - Fast RTPS: Programming with the Default middleware for...
 
Fast RTPS
Fast RTPSFast RTPS
Fast RTPS
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 

Recently uploaded (20)

Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 

Building a Dynamic Rules Engine with Kafka Streams

  • 1. Building a Dynamic DSL Rules Engine with Kafka Streams Will LaForest Field CTO Michael Peacock Field Engineer
  • 2. 2 Why Something Other Than These FRAMEWORKS ksqlDB
  • 3. Stream Processing Frameworks General Purpose 3 Stream processing very powerful BUT ● Uses GPLs (Java, Python) or SQL ● 99.2% of workforce are NOT coders ● Maybe SQL is easier ○ Still general purpose ○ Impractical for some domains 99% 1%
  • 4. • Volume of rules high? 100s or 1000s? • If each rule is a stream processing job • Each a consumer and producer • Same network IO 1000x • Process overhead • Serialization/Deserialization Technical Challenges 4 Cartoon style. Gnome sees something gross and vomits
  • 5. Technical Challenges 5 • Multiple dynamic output topics? • Change rate of ruleset high? • Minimize time to to processing? • Must be dynamic • Don’t compile and deploy • Don’t create a new stream processor • Don’t restart topology if possible
  • 6. Anyone Know what this is? 6 to spiral make "n 1 while [:n < 100] [ make "n :n + 5 fd :n rt 90 ] end
  • 7. Domain Specific Languages Purpose built for a domain ● HTML web pages ● Regex for patterns ● MATLAB for numerical analysis ● GraphQL - Querying APIs ● Sigma for cyber and log data If serialized as JSON, YAML, XML ● DSL as Data ● Apply data oriented operations ● Validate, secure, govern as data ● Can be transformed easily ● Generated from domain specific tool ● Dynamic
  • 9. A Typical SIEM Platform Architecture Agents / Collectors Reports and alerts Search and investigate Monitor and analyze SIEM Tool Network traffic Firewall logs RDBMS Application logs HTTP proxy logs
  • 10. Forensic Archive AI/ML Architecture with Kafka Curate Real-time Detection APP SIEM Index SOAR HDFS S3 Big Query Syslog Network traffic Firewall logs RDBMS Application logs HTTP proxy logs QRadar Arcsight Splunk Machine Data spooldir (files), SNMP Traps, Databases, Sftp, MQs, S2S Elastic
  • 12. What is Sigma? 12 https://github.com/SigmaHQ/sigma ● Sigma is an open signature (patterns) format ● Focused on log and network ● Researchers or analysts can describe developed detection patterns ● Shareable with others (IMPORTANT) ● Rules schema agnostic ● Platform agnostic ○ Same rule can be be used for ○ Elastic, Splunk, Azure Sentinel, (now Kafka)
  • 13. Confluent Sigma 13 Sigma Rules (YAML) Source Data Sigma Stream Processor SIEM Applications Cold storage High Risk Detections SOAR Filtered Cold retention Data Rules
  • 14. Why Kafka Streams API? 14 ● Simple to build a topology ● Easy to run as app ● No required runtime ● Start small easily App 1 ● Scale massively ● All you need is Kafka ● Can run anywhere!
  • 15. A Walkthrough of the Sigma Project
  • 16. A Walkthrough of the Sigma Project 16 sigma-parser sigma-streams sigma-streams-ui https://github.com/confluentinc/confluent-sigma
  • 17. Sigma Stream Topologies Simple Topology supports one sub-topology to many rules Aggregate Topology requires one sub-topology for each rule
  • 18. Simple Topology 18 flatMapValues KStream iterate through each rule for the processor (product/service filtering) validates the streaming data against the DSL rule and add the results to the output list if a match is found optional config variable to return after the first match or return all matches dynamic output topic flatMapValues Create a new KStream by transforming the value of each record in this stream into zero or more values with the same key in the new stream. { "@stream": "dns", "@system": "bobs.bigwheel.local", "@proc": "zeek", "ts": 1588205199.82437, "uid": "Cvf4XX17hSAgXDdGEd", "id_orig_h": "10.0.1.6", "id_orig_p": 54243, "id_resp_h": "10.0.0.4", "id_resp_p": 53, "proto": "udp", "trans_id": 41180, "rtt": 0.001528024673461914, "query": "newyork.dmevals.local", "qclass": 1, "qclass_name": "C_INTERNET", "qtype": 1, Sigma Rule Streaming Data 1 2 3 4 5
  • 19. Detection Results / Dynamic Output Topic 19 '^(?<timestamp>w{3}sd{2}sd{2}:d{2}:d{2})s(?< hostname>[^s]+) s%ASA-d-(?< messageID>[^:]+):s(?< action>[^s]+)s(?< protocol>[^s]+) ssrcsinside:(?< src>[0-9.]+)/(?< srcport>[0-9]+)sdstsoutside:(?< d est>[0-9.]+)/(?< destport>[0-9]+)' Sigma Rule firewalls topic adds the record data and rule metadata if the rule contains a regex with mapping, adds the mapped fields adds any custom fields dynamic output topic 1 2 3 4
  • 20. Aggregate Topology 20 iterate through each rule for the processor (product/service filtering) validates the streaming data against the DSL rule groups by the key sliding window from rule counts number of instances within the window filters based on aggregation and operation in rule dynamic output topic { "@stream": "dns", "@system": "bobs.bigwheel.local", "@proc": "zeek", "ts": 1588205199.82437, "uid": "Cvf4XX17hSAgXDdGEd", "id_orig_h": "10.0.1.6", "id_orig_p": 54243, "id_resp_h": "10.0.0.4", "id_resp_p": 53, "proto": "udp", "trans_id": 41180, "rtt": 0.001528024673461914, "query": "newyork.dmevals.local", "qclass": 1, "qclass_name": "C_INTERNET", "qtype": 1, Sigma Rule Streaming Data 1 3 5 6 7 4 2
  • 21. Confluent Sigma Demo Sigma Stream Processors Sigma Streams UI Sigma Rule Editor sigma rules topic DNS dns detections topic dns topic rule parsing, filtering, aggregation, windowing sigma rules cache CONN DHCP HTTP SSL x509 Zeek Data
  • 22. Demo
  • 23. Considerations for DSL Stream Processing ● Consider how to handle newly arriving rules ○ Apply only to unprocessed messages ○ What about previous data? ■ Spawn separate app or topology ■ When “caught up” merge into single ● Versioned state stores now allow new strategy ○ Only join rules available for corresponding record time ○ For records only join against rules that existed at record time ■ Upcoming KIP-960, KIP-968, KIP-969 for our customized joins 23 https://www.confluent.io/blog/introducing-versioned-state-store-in-kafka-streams/
  • 24. Considerations Continued ● Consider how routines may interact with each other ○ Sigma doesn’t support notion of priority or dependence ● For a DSL in which rules interact (IFTTT) ○ Use Processor API to map to an appropriate DAG 24
  • 25. Whats Next? “Roadmap” https://github.com/confluentinc/confluent-sigma/projects/1 ● Refactoring framework to make DSL pluggable (on going) ○ Implement a builder class: DSL routines -> Kafka Streams topology ○ Re-use all the scaffolding we have built ○ LLM stream processing ● Optimizations ○ Aggregate topology optimizations (bin based on time and key) ○ More graceful management of aggregate topology changes ■ Self coordination ○ General profiling and optimization ● Switch to GlobalKTable from KCache 25