Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021

Pulsar Virtual Summit North
America 2021
Kiran Matty
Director of Product Management
Aerospike

2 Pulsar Virtual Summit North
America 2021
▪ Director of Product for Ecosystem @ Aerospike
▪ Domain experience spans Big Data Infrastructure
and Data Security @ Visa, Hortonworks, and Cisco
▪ Interests include large scale distributed systems
and AI/ML
▪ Lego builder in spare time
whoami

America 2021
Source: Google I/O 2018
Training can take Forever…
TRAINING TIME
Minutes – hours
1 - 4 Days
1 - 4 Weeks
> 1 month

America 2021
Source: Micron
▪ Traditional HDD
based systems are
not suitable for
Training
▪ Model need to be
retrained to
address data
/Model drift
AI/ML needs Hybrid Storage

America 2021
AI/ML needs memory-like access at Petabyte scale with lower
TCO

America 2021
Why do other databases fall short?

Pulsar Virtual Summit North
America 2021
High Frequency Trading IIoT / Predictive Maintenance
Aerospike Drives data-driven decisioning use cases
is it fresh
Fraud Detection Personalization/Customer 360o AdTech Real Time Bidding

America 2021
CLOUD /
ON-PREM
8
CONNECT
for Spark
Python Client
COMPUTE
STORAGE
NOTEBOOK &
ML PACKAGES
CONTAINER
PLATFORM
A Blueprint for AI/ML
CONNECT
for Pulsar

America 2021
Why Pulsar?
Durability
Scalability Geo-Replication
Multi-Tenancy
Unified Messaging
Model

America 2021
Mapping Aerospike <> Pulsar Data models
Aerospike RDBMS Pulsar
Namespace Database Topic
Set(optional) Table Topic
Record Row Record
Bin Column Fields (based
on schema)
Key Key Key
Mapping is via YAML files.

America 2021
Pub/Sub
API
Pub/Sub
API
Reader and
Batch API
Pulsar
IO/Connectors
Stream Processor
Applications
Prebuilt Connectors Custom Connectors
Aerospike Sink Connector*
Microservices or
Event-Driven Architecture
Publisher
Aerospike Source Connector
Subscriber
Aerospike Connect for Pulsar
IOT/edge devices
Change Notification:
{"metadata":{"namespace":"device","set":"streaming_write_set"
,"digest":"SH0QwiJxdW5Wkf/hAVJGn7Sw37U=","msg":"write","ge
n":38,"lut":0,"exp":0},"three":37089,"two":"two_89","one":37089}
Change
Notification
s
*Not GA’d
Schema Registry

America 2021
Data
Preparation
Model
Training
Third Party Data
Exploratory
Data
Analysis
Parameter
Tuning
Data Scientist
Model
Validation
MODEL
SERVING
Speeding up Training Pipeline (Conceptual View)
CONNECT
for Spark
Aerospike
Database
System of
Record
AI/ML Platform
ML
Application
HTTP
1
2
4
3

America 2021
Real-time Inference (Conceptual View)
Edge Systems
across Datacenters
Data
Preparation
HTTP
Model
Serving
Predictions
ML
Application Predictions
Aerospike
Database
Core System
Streaming
Source
CONNECT
for Pulsar
CONNECT
for Pulsar
Application
Specialist
Aerospike
Database
Edge Location 1
Aerospike
Database
Edge Location n
XDR
CONNECT
for Spark
HTTP
API
API
API
Pulsar Spark
Connector

America 2021
Massive Parallelization
✔80% reduction in Spark Job Execution time
✔Reduced training time
✔Increase frequency of retraining
Operational reliability at extreme scale
✔13B Objects
✔150 TB unique data – multiple times a day
Increased ROI
✔Only 33 Aerospike servers
✔Increased utilization of Spark Cluster (300
nodes and 7,500 cores)
Massive Parallelism w/ Aerospike and Spark
CASE STUDY:
“We were using custom code before which led to data
quality issues and a complex data infrastructure. With
Aerospike, we are processing Spark jobs that used to take
12 hours now in just 2.4.
Senior Director, Data Science and Engineering
Top Global Ad Tech company
GLOBAL AD TECH
COMPANY

America 2021
Execute Spark jobs faster with massive
parallelism
1. Reduce Training Time
3. Increase Frequency of Re-Training
Conduct in-place data exploration
Create low latency and high throughput
streaming pipeline
1
2
3
The Aerospike
Difference for
AI/ML
Eliminate compliance headaches by removing the need to
copy data into multiple systems
“Aerospike is second to none for
ingesting and persisting millions of
events per second… (Aerospike)
allows me to do near-instantaneous
machine learning on the data as it
lands.”
Theresa Melvin
Chief Architect of AI-Driven Big Data Solutions, HPE
2. Maximize ROI
Aerospike data platform connects readily to Spark
and Pulsar

America 2021
Thank you
We are hiring for our India and the US offices.
https://aerospike.com/solutions/use-cases/ai-ml/

Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021

Similaire à Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021 (20)

Plus de StreamNative

Plus de StreamNative (20)

Dernier

Dernier (20)

Add Horsepower to AI/ML streaming Pipeline - Pulsar Summit NA 2021