Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

•Télécharger en tant que PPTX, PDF•

3 j'aime•3,072 vues

Within fintech catching fraudsters is one of the primary opportunities for us to use streaming applications to apply ML models in real-time. This talk will be a review of our journey to bring fraud decisioning to our tellers at Capital One using Kafka, Flink and AWS Lambda. We will share our learnings and experiences to common problems such as custom windowing, breaking down a monolith app to small queryable state apps, feature engineering with Jython, dealing with back pressure from combining two disparate streams, model/feature validation in a regulatory environment, and running Flink jobs on Kubernetes.

Technologie

FINDING BAD
ACORNS
ANDREW GAO
&
JEFF SHARPE
FLINK FORWARD 2018

Developing a Fraud
Defense Platform
Fraud Defense at the
Teller Using Flink
Our journey to build a Fraud Decisioning Platform and use
Flink to build out the use cases

OUR USERS
Fraud
Operator
Customer
Data
Scientist
Data
Analyst
Engineer
Product
Owner

PROS
• Community support for
Docker/Kube
• Resilient
• Easy to tear down and bring
back
• Maximizing resource efficiency
CONS
• Maintaining your own
Kubernetes solution
• Containing blast radius
• Edge cases when combining #
of technology solutions
Developing on Kubernetes has been challenging but very
rewarding

A FLINK MONOLITH
• Problem: Develop a stream processing workflow for
two legacy batch data sources
• First Attempt: Do everything in Flink and take
advantage of Flink Connected Streams

1
2 3
Using Flink operators to build our application workflow
4

PROS
• Cheap
• Not a lot of
Code/Config
• Scalability / Availability
• Deployments are a
breeze
CONS
• Not truly stateless
• Start-up time
AWS Lambda is a good fit for our use case and works well
with our underlying technologies

90 Day Storage Window
CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
30 Day Virtual View
90 Day Filtered View

CUSTOM WINDOWS FOR OPTIMIZATION
AND PORTABILITY
Most-Recent-Beyond-24-Hours Window
24 Hour Offset Dynamic Window

USING JYTHON TO BRIDGE THE GAP TO
DATA SCIENTISTS
Flink
Jython Adapter
.py .py .py .py
Windows
Data
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
Featur
e
.py .py .py .py
Data

GITFLOW AND JYTHON IMPROVE
TRACEABILITY
Featur
e JAR
v1.0.42
Junit
Tests
Pull
Request
Merge
Build
Develop Denied
Failed
Maven
Import
Junit
Tests
Build
Flink
Job
JAR
Commit

FEATURES EXIST TO FEED MODELS
FeatureFeature
Model Model Score
H20 Tensor Flow Seldon (whatever)

BREAKING UP THE MONOLITH
• Problem: Back Pressure leading to Delayed Transactions
• Solution: Break up the monolith Flink App into small Queryable State
Apps

•Connected Streams
•Flink Keyed State
•Checkpointing/Savepointing
•Queryable State
Features Used
•Flink Versioning (FLINK-7783, FLINK-8487)
•Keyed Source Function
•Kafka Offsets
Issues
We had a lot of fun and success using Flink, but not without a
few hiccups

Recommandé

Flink vs. SparkSlim Baltagi

Introduction to RedisDvir Volk

The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup

Introduction to Apache KafkaAIMDek Technologies

Apache Flink Training: System OverviewFlink Forward

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit

Elasticsearch in NetflixDanny Yuan

The Complete Guide to Service MeshAspen Mesh

Recommandé

Flink vs. SparkSlim Baltagi

Introduction to RedisDvir Volk

The eBay Architecture: Striking a Balance between Site Stability, Feature Ve...Randy Shoup

Introduction to Apache KafkaAIMDek Technologies

Apache Flink Training: System OverviewFlink Forward

Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit

Elasticsearch in NetflixDanny Yuan

The Complete Guide to Service MeshAspen Mesh

CDC patterns in Apache Kafka®confluent

Real-time Stream Processing with Apache FlinkDataWorks Summit

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...Tokuhiro Matsuno

Can Apache Kafka Replace a Database?Kai Wähner

OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...NETWAYS

Capture the Streams of Database Changesconfluent

Batch and Stream Graph Processing with Apache FlinkVasia Kalavri

MongoDB vs. Postgres Benchmarks EDB

Flink history, roadmap and visionStephan Ewen

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward

How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward

Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit

Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward

Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward

How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward

Introducing Change Data Capture with DebeziumChengKuan Gan

NiFi 시작하기Byunghwa Yoon

Fundamentals of Apache KafkaChhavi Parasher

Scalability, Availability & Stability PatternsJonas Bonér

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivAmazon Web Services

Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps ZNetLive

Contenu connexe

Tendances

CDC patterns in Apache Kafka®confluent

Real-time Stream Processing with Apache FlinkDataWorks Summit

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...Tokuhiro Matsuno

Can Apache Kafka Replace a Database?Kai Wähner

OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...NETWAYS

Capture the Streams of Database Changesconfluent

Batch and Stream Graph Processing with Apache FlinkVasia Kalavri

MongoDB vs. Postgres Benchmarks EDB

Flink history, roadmap and visionStephan Ewen

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward

How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward

Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit

Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward

Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward

How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward

Introducing Change Data Capture with DebeziumChengKuan Gan

NiFi 시작하기Byunghwa Yoon

Fundamentals of Apache KafkaChhavi Parasher

Scalability, Availability & Stability PatternsJonas Bonér

Tendances (20)

CDC patterns in Apache Kafka®

Real-time Stream Processing with Apache Flink

promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...

Can Apache Kafka Replace a Database?

OSMC 2022 | Ignite: Observability with Grafana & Prometheus for Kafka on Kube...

Capture the Streams of Database Changes

Batch and Stream Graph Processing with Apache Flink

MongoDB vs. Postgres Benchmarks

Flink history, roadmap and vision

Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...

How to build a streaming Lakehouse with Flink, Kafka, and Hudi

Storage Requirements and Options for Running Spark on Kubernetes

Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...

Building Reliable Lakehouses with Apache Flink and Delta Lake

How Uber scaled its Real Time Infrastructure to Trillion events per day

Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...

Introducing Change Data Capture with Debezium

NiFi 시작하기

Fundamentals of Apache Kafka

Scalability, Availability & Stability Patterns

Similaire à Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel AvivAmazon Web Services

Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps ZNetLive

GE Capital Legacy Modernization and Mainframe Conversionguatham

Orchestrate Your End-to-end Mainframe Application Release PipelineDevOps.com

From 0 to DevOps in 80 Days [Webinar Replay]Dynatrace

Accelerate User Driven Innovation [Webinar]Dynatrace

Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsWeaveworks

Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsSonja Schweigert

DevOps adoption in the enterpriseSanjeev Sharma

Webinar: Capabilities, Confidence and Community – What Flux GA Means for YouWeaveworks

Open by DesignNimesh Bhatia

Top 5 benefits of dockerJohn Zaccone

Intro to GitOps with Weave GitOps, Flagger and LinkerdWeaveworks

Transform Digital Business with DevOpsDaniel Oh

Continuous testing for Agile and DevOps teamsLaurent PY

Securing Red Hat OpenShift Containerized Applications At Enterprise ScaleDevOps.com

Docker & aPaaS: Enterprise Innovation and Trends for 2015WaveMaker, Inc.

Continuous Deployment To The CloudMarcin Grzejszczak

IBM JavaOne Community Keynote 2017John Duimovich

Meetup Openshift Geneva 03/10MagaliDavidCruz

Similaire à Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns" (20)

DevOps on Steroids Featuring Red Hat & Alantiss - Pop-up Loft Tel Aviv

Webinar by ZNetLive & Plesk- Winning the Game for WebOps and DevOps

GE Capital Legacy Modernization and Mainframe Conversion

Orchestrate Your End-to-end Mainframe Application Release Pipeline

From 0 to DevOps in 80 Days [Webinar Replay]

Accelerate User Driven Innovation [Webinar]

Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps

DevOps adoption in the enterprise

Webinar: Capabilities, Confidence and Community – What Flux GA Means for You

Open by Design

Top 5 benefits of docker

Intro to GitOps with Weave GitOps, Flagger and Linkerd

Transform Digital Business with DevOps

Continuous testing for Agile and DevOps teams

Securing Red Hat OpenShift Containerized Applications At Enterprise Scale

Docker & aPaaS: Enterprise Innovation and Trends for 2015

Continuous Deployment To The Cloud

IBM JavaOne Community Keynote 2017

Meetup Openshift Geneva 03/10

Plus de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward

Evening out the uneven: dealing with skew in FlinkFlink Forward

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward

Introducing the Apache Flink Kubernetes OperatorFlink Forward

Autoscaling Flink with Reactive ModeFlink Forward

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward

One sink to rule them all: Introducing the new Async SinkFlink Forward

Tuning Apache Kafka Connectors for Flink.pptxFlink Forward

Flink powered stream processing platform at PinterestFlink Forward

Apache Flink in the Cloud-Native EraFlink Forward

Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward

Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward

The Current State of Table API in 2022Flink Forward

Flink SQL on Pulsar made easyFlink Forward

Dynamic Rule-based Real-time Market Data AlertsFlink Forward

Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward

Processing Semantically-Ordered Streams in Financial ServicesFlink Forward

Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward

Batch Processing at Scale with Flink & IcebergFlink Forward

Plus de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...

Evening out the uneven: dealing with skew in Flink

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...

Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...

Introducing the Apache Flink Kubernetes Operator

Autoscaling Flink with Reactive Mode

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...

One sink to rule them all: Introducing the new Async Sink

Tuning Apache Kafka Connectors for Flink.pptx

Flink powered stream processing platform at Pinterest

Apache Flink in the Cloud-Native Era

Where is my bottleneck? Performance troubleshooting in Flink

Using the New Apache Flink Kubernetes Operator in a Production Deployment

The Current State of Table API in 2022

Flink SQL on Pulsar made easy

Dynamic Rule-based Real-time Market Data Alerts

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

Processing Semantically-Ordered Streams in Financial Services

Tame the small files problem and optimize data layout for streaming ingestion...

Batch Processing at Scale with Flink & Iceberg

Dernier

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Pigging Solutions in Pet Food ManufacturingPigging Solutions

AI as an Interface for Commercial BuildingsMemoori

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

How to convert PDF to text with Nanonetsnaman860154

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Dernier (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Pigging Solutions in Pet Food Manufacturing

AI as an Interface for Commercial Buildings

Pigging Solutions Piggable Sweeping Elbows

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

The transition to renewables in India.pdf

Connect Wave/ connectwave Pitch Deck Presentation

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

How to convert PDF to text with Nanonets

Presentation on how to chat with PDF using ChatGPT code interpreter

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

My Hashitalk Indonesia April 2024 Presentation

Unlocking the Potential of the Cloud for IBM Power Systems

Benefits Of Flutter Compared To Other Frameworks

Understanding the Laravel MVC Architecture

Designing IA for AI - Information Architecture Conference 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

08448380779 Call Girls In Friends Colony Women Seeking Men

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Acorns"

1. FINDING BAD ACORNS ANDREW GAO & JEFF SHARPE FLINK FORWARD 2018

2. ANDREW GAO JEFF SHARPE

3. Developing a Fraud Defense Platform Fraud Defense at the Teller Using Flink Our journey to build a Fraud Decisioning Platform and use Flink to build out the use cases

4. DEVELOPING A FRAUD DEFENSE PLATFORM

5. OUR USERS Fraud Operator Customer Data Scientist Data Analyst Engineer Product Owner

7. OUR USERS Fraud Operator Customer Data Scientist Data Analyst Engineer Product Owner

8. ARCHITECTURE DATA ACTIONS MAGIC!

9. RUNNING ON

10.

11. RUNNING ON

12. PROS • Community support for Docker/Kube • Resilient • Easy to tear down and bring back • Maximizing resource efficiency CONS • Maintaining your own Kubernetes solution • Containing blast radius • Edge cases when combining # of technology solutions Developing on Kubernetes has been challenging but very rewarding

13.

14. FRAUD DEFENSE AT THE TELLER

15.

16. A FLINK MONOLITH • Problem: Develop a stream processing workflow for two legacy batch data sources • First Attempt: Do everything in Flink and take advantage of Flink Connected Streams

17. 1 2 3 Using Flink operators to build our application workflow 4

18. PROS • Cheap • Not a lot of Code/Config • Scalability / Availability • Deployments are a breeze CONS • Not truly stateless • Start-up time AWS Lambda is a good fit for our use case and works well with our underlying technologies

19. 1 2 3 Using Flink operators to build our application workflow 4

20. 90 Day Storage Window CUSTOM WINDOWS FOR OPTIMIZATION AND PORTABILITY 30 Day Virtual View 90 Day Filtered View

21. CUSTOM WINDOWS FOR OPTIMIZATION AND PORTABILITY Most-Recent-Beyond-24-Hours Window 24 Hour Offset Dynamic Window

22. 1 2 3 Using Flink operators to build our application workflow 4

23. USING JYTHON TO BRIDGE THE GAP TO DATA SCIENTISTS Flink Jython Adapter .py .py .py .py Windows Data Featur e Featur e Featur e Featur e Featur e Featur e Featur e Featur e .py .py .py .py Data

24. GITFLOW AND JYTHON IMPROVE TRACEABILITY Featur e JAR v1.0.42 Junit Tests Pull Request Merge Build Develop Denied Failed Maven Import Junit Tests Build Flink Job JAR Commit

25. 1 2 3 Using Flink operators to build our application workflow 4

26. FEATURES EXIST TO FEED MODELS FeatureFeature Model Model Score H20 Tensor Flow Seldon (whatever)

27.

28. BREAKING UP THE MONOLITH • Problem: Back Pressure leading to Delayed Transactions • Solution: Break up the monolith Flink App into small Queryable State Apps

29. CHIPMUNKS

30. •Connected Streams •Flink Keyed State •Checkpointing/Savepointing •Queryable State Features Used •Flink Versioning (FLINK-7783, FLINK-8487) •Keyed Source Function •Kafka Offsets Issues We had a lot of fun and success using Flink, but not without a few hiccups

31. Developing a Fraud Defense Platform Fraud Defense at the Teller Using Flink Our journey to build a Fraud Decisioning Platform and use Flink to build out the use cases QUESTIONS?

Notes de l'éditeur

Jeff Intro Andrew Intro We are part of the Forest teams(very high level intro) Kubernetes-based fraud decisioning platform that you can deploy multiple fraud use cases on With the goal of being able to rapidly spin up fraud apps Running in Production since September 2017
Our talk today: Talk briefly about our journey building out this Forest platform using Kubernetes as well as talk about how we used Flink with Kubernetes at a high level Then talk about a specific use case we have on the platform and do a deep dive on what’s inside our Flink app
Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
Customers First If one day you take a look at your bank account and its empty However if your account was locked for no reason you would be upset This sense of balance between catching stopping fraud and providing a great customer experience is a common trend that we have to deal with If we wanted to stop fraud completely we could just stop letting people take their money On a similar note, we have a limited number of fraud operators Do not have the manpower to call every single person up and ask them Primary directive of the platform is to empower Data Scientists/ Data Analysts by building the tools on the platform to help create the models needed to make decisions This includes having access to all the data in a fast and easy-to-understand format Seeing how their models are performing, and whether the features are being calculated as expected When they need to refit the model they need to be able to do the data transformations quickly so we can turn a refreshed model around Lastly as we are developing a fraud platform, we need to keep in mind the engineers/developers that will be developing the fraud app it should be something that engineers enjoy to develop on When you have a feature/model/action repository its very easy to develop turn around fraud apps To help us balance these different needs we have our product owners to help bridge the gap
14 EC2s 6 m4.10xlarge for general minions 5 m4.2xlarge for kafka nodes 3 m4.large for masters Ansible to provision 200+ pods Flink apps in Java/Scala/Kotlin Microservices in Golang
Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kappa Architecture Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
Holy smokes that’s a lot Zookeeper/Kafka/Flink/Nifi Kafka is our primary messaging bus throughout the platform Nifi is one of the tools we use to grab data from different sources in the company Flink does the calculations and applies needed transformations Minio/Istio to handle http communications throughout the platform EFK = ElasticSearch / FluentD / Kibana Docker logs Managed AWS service Influx / Prometheus / Grafana Metrics reporting and Dashboards Platform health Fraud health Drill / zeppelin / s3 for data analysts to view transactions Why are we switching from influx to prometheus
Kubernetes has been a challenge If a task manager goes down, it will auto-heal If your configurations are set up correctly you can just delete pods and they’ll come back Unless your configurations are completely fleshed out, the blast radius on failure can be rippling Situation where docker logs could not make it out to kubernetes logs because the docker machines were dying Developed internal tool for ci/cd and deployment
Use cases tell us the resources they need and we provision them a flink cluster 1 Job Manager per cluster 5 Task Managers per cluster RocksDB backend Checkpoint/Savepoint persist on S3 Job Deployment Options
Considerations People obviously don’t want to wait too long But we want to respond with the most data we have available on the customer
Two data streams need to share state Data stream from online interactions / all other customer interactions Data stream that we receive from the branch Need to calculate Features Need to apply ML model Need to respond in real-time
Developed in python, evaluating golang Developed internal tool for ci/cd and deployment
Teller transactions have a real-time SLA Connected Streams is the culprit Break Up One Flink App into Smaller Flink Queryable State Apps Flink Apps as Functions Disparate Data Streams: Back Pressure In our case: we have all the account level activity for a given customer from one source and on the other we have the data from the teller machine Not all transactions are equal due to their source. However in a ML world we still want to examine every transaction Results in back pressure and uneven transaction flow
Alvin for each data source Scurry of Alvins build out our feature repository Theodore builds his own features, adds on features for Alvin and the passes it down Why did we break Simon out? We can replace it with anything such as Seldon
https://issues.apache.org/jira/browse/FLINK-7783 https://issues.apache.org/jira/browse/FLINK-8487