We all love AI. But what about financial applications? It turns out that AI, and in particular ML and DL can be very effectively applied to financial services. In this presentation, Natalino will illustrate a number of use cases such as transaction fraud prevention and credit authorization using AI and machine learning techniques.
Starting from there, Natalino will show how those problems can be solved with AI techniques with code snippets and live demos using Keras, Tensorflow and Scikit-Learn applied to some financial datasets.
Natalino will take you on this AI for finance journey, describing how techniques such as deep learning, t-sne, dimensionality reduction can be used as the "data engines" for the next-gen financial applications both in retail as well as commercial banking. -
See more at: http://globalbigdataconference.com/santa-clara/global-artificial-intelligence-conference-83/speaker-details/natalino-busa-41434.html
3. 3 Natalino Busa - @natbusa
Cognitive Finance Group Advisory Board Member
ING Group Enterprise Architect: Cybersecurity, Fintech
Teradata Head of Applied Data Science
Teradata Global Evangelist on Open Sourced Technologies
O’Reilly Author and Speaker
Philips Senior Researcher, Data Architect
Linkedin and Twitter:
@natbusa
5. 5
Natalino Busa - @natbusa
The Medici Bank:
Italian: Banco Medici
1397–1494
6. 6
Natalino Busa - @natbusa
Data as a Relationship
● Trust
● Transparency of Use
● Customer First
● Regulations and Laws
● Respect and Protect
● Providing a Service
7. 7 Natalino Busa - @natbusa
An ethical approach
for Actionable Financial Data
Help the customer
Propose, Advise, Select, Filter, Connect,
Simplify1.
Protect the customer
Detect, Prevent, Alert, Block, Defend,
Identify, Authorize
2.
9. 9 Natalino Busa - @natbusa
http://www.slideshare.net/ING/4q15-media
● Innovation helps to empower people to make better
financial decisions. ING, has launched several new
omni-channel banking platforms.
● The platform gives customers insights
into their personal finances in an easy
and intuitive way.
Financial personalized recommenders
13. 13 Natalino Busa - @natbusa
● Fintech innovation to help strengthen our lending
capabilities and better serve our consumer and SME
clients.
● Kabbage, one of the leading US-based technology
platforms providing automated lending to SME.
● In January 2016, ING has made an investment in
fintech WeLab, which provides consumer loans in
China and Hong Kong in a fully automated process
that just takes minutes, from application to approval.
http://www.slideshare.net/ING/4q15-media
Strategic data-driven initiatives
14. 14
Natalino Busa - @natbusa
Approaching (Almost) Any Machine Learning Problem
- Abhishek Thakur, Kaggle Grandmaster -
data labels
raw data: tables, files Useful dataData munging Feature
Engineering
Tabular Data ready for ML
15. 15 Natalino Busa - @natbusa
Input
Hand Designed
Program
Input Input
Rule-based System
Output
Hand Designed
Features
Mapping from
features
Output
Learned
Features
Mapping from
features
Output
Classic Machine
Learning
Input
Learned
Features
Learned
Complex features
Output
Mapping from
features
Representational
Machine Learning
Deep Learning
(end-to-end learning)
Prof. Yoshua Bengio - Deep Learning
https://youtu.be/15h6MeikZNg
Predictive API’s: How to get there?
16. 16 Natalino Busa - @natbusa
From Feature to Architecture Engineering:
17. 17 Natalino Busa - @natbusa
Demo:
Credit Payment Defaulting
with TensorFlow and Keras
Methodology
This research aimed at the case of customers
default payments in Taiwan and compares the
predictive accuracy of probability of default among
six data mining methods. From the perspective of
risk management, the result of predictive
accuracy of the estimated probability of default
will be more valuable than the binary result of
classification - credible or not credible clients https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
18. 18 Natalino Busa - @natbusa
Step 0: data exploration
Target variable: default payment next month
Color scheme: yes, defaulting not defaulting g
30. 30 Natalino Busa - @natbusa
Clustering geolocated data
using Spark and DBSCAN
How to group users’ events using machine learning and distributed computing
By Natalino Busa
Predictive API’s: Clustering Geolocated Data
34. @natbusa | linkedin.com: Natalino Busa
Fast writes
2D Data Structure
Replicated
Tunable consistency
Multi-Data centers
CassandraKafka Spark
Streaming Events
Distributed, Scalable Transport
Events are persisted
Decoupled Consumer-Producers
Topics and Partitions
Ad-Hoc Queries
Joins, Aggregate
User Defined Functions
Machine Learning,
Advanced Stats and Analytics
Kafka+Cassandra+Spark: SMACK stack
Streaming Machine Learning
35. @natbusa | linkedin.com: Natalino Busa
Spark: Unified Distributed Computing:
SQL + Machine Learning + Graph Analytics
Spark - RDDs
Streaming SQL MLlib Graphx
Analytics, Statistics, Data
Science, Model Training
HDFS NoSQL SQL
Data Sources
Map-Reduce
HDFS KAFKA
Hive
36. @natbusa | linkedin.com: Natalino Busa
Cassandra: Store all the data
Spark: Analyze all the data
DC1: replication factor 3 DC2: replication factor 3 DC3: replication factor 3 + Spark Executors
Storage! Analytics!
Data
Spark and Cassandra: distributed goodness
37. @natbusa | linkedin.com: Natalino Busa
Cassandra - Spark Connector
Cassandra: Store all the data
Spark: Distributed Data Processing
Executors and Workers
Cassandra-Spark Connector:
Data locality,
Reduce Shuffling
RDD’s to Cassandra Partitions
DC3: replication factor 3 +
Spark Executors
39. 39
Natalino Busa - @natbusa
Network Intrusion Detection
It contains 130 million flow records involving
12,027 distinct computers over 36 days (not
the full 58 days claimed for the entire data
release).
Each record consists of: time (to nearest
second), duration, source and destination
computer ids, source and destination ports,
protocol, number of packets and number of
bytes
Techniques: TDA, Dimensionality Reduction
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction