SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Anahita Bhiwandiwalla – Intel Corporation
Karthik Vadla – Intel Corporation
A Predictive Analytics
Workflow on DICOM
Images using Apache
Spark
Who we are?
Agenda
• Use-case overview
• Challenges
• What is our data?
• Deriving Insights
• Machine Learning
• Tools
• Spark-tk
• Demo
• spark-tensorflow-connector
• Performance
Use-case overview
• Vast amount of medical image data available
• Meta-data accompanying the image
– Patient information
– Status of condition
– Method of image capture
– Time to events(if part of a study)
Create distributed technology needed to efficiently
store, scan and analyze data in healthcare.
Challenges
• Getting data for experimentation is hard!
• Combine data from multiple public sources for simulation
• Multiple tools – image processing libraries, visualization
tools, distributed engines, ML libraries
• Compatibility across different tools
How do we make Machine Learning easy to use?
Data: DICOM
- Digital Imaging and
COMmunications in Medicine
- International standard format to
store, exchange, and transmit
medical images
- Standards for imaging modalities -
radiography, ultrasonography,
computed tomography (CT),
magnetic resonance imaging (MRI),
and radiation therapy.
- Protocols for image exchange, image
compression, 3-D visualization,
image presentation etc.
Data: Meta-data
• Image related: storage, date, relevant parts of
the image etc.
• Patient related: age, weight, height, DOB,
medical history
• Device related: manufacturer information,
operator, method of capture
Deriving insights
• Can we develop smart filtering techniques based on
image meta-data?
• Can we conclude and predict certain conditions based
on patient meta-data?
• Can we classify image capture techniques and suggest
improvements based on device meta-data?
• Can we combine image & patient meta-data to conclude
and predict?
• Can analytics help automate/improve image capture
techniques?
Machine Learning
• Patient data
– Train classifier model on patient data to predict presence
of a condition
• Logistic Regression, Random Forest, Naïve Bayes
– Train regressor model on patient data to predict probability
of a condition
• Linear Regression, Random Forest
– Train survival analysis model on patient data to predict
time to event, compare risk of populations
• Accelerate Failure Time Model, Cox Proportional Hazards Model
Machine Learning
• Image data
– Compute the Eigen decomposition of the image
– Compute the Covariance Matrix of the image
– Compute the Principal Component Analysis of the
image pixel data
Machine Learning
• Device data:
– Predict trends in image capture techniques
• Classification/Regression techniques
– Predict device time to failure
• Cox Proportional Hazards Model, Accelerated Failure Time
– Recommendations to device capture methods
• Collaborative filtering
Tools
• pydicom
– Python package for working with DICOM files such as medical images, reports, and radiotherapy
– Single threaded
• dcm4che
– Collection of open source applications and utilities for the healthcare enterprise.
– Developed in the Java programming language for performance and portability, supporting
deployment on JDK 1.6 and up
• matplotlib
– Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats
and interactive environments across platforms.
– Can be used in Python scripts, Python and IPython shell, jupyter notebook, web application servers
• Machine Learning libraries
– Classifiers, regressors, failure analysis analysis
– Eigen decomposition
Spark-tk Introduction
• Open source library enhancing the Spark experience
• APIs for feature engineering, ETL, graph construction,
machine learning, image processing, scoring
• Abstraction level familiar to data scientists; removes
complexity of parallel processing
• Lower level Spark APIs seamlessly exposed
• Easy to build apps plugging in other tools/libraries
https://github.com/trustedanalytics/spark-tk
http://trustedanalytics.github.io/spark-tk/
Spark-tk components
• Frames
– Scalable data frame representation
– More intuitive than low level HDFS file and Spark
RDD/DataFrame/DataSet formats; schema inference
– APIs to manipulate the data frames for feature engineering and
exploration, such as joins and aggregations
– Run user-defined transformations and filters using distributed
processing
– Input to our models
– Easy to convert to/from Spark-tk Frames - Spark representations
Spark-tk components
• Graphs
– Scalable graph representation based on Frames for
vertices and edges
– In house distributed algorithms for graph analytics
using GraphX
– Supports importing/exporting to OrientDB’s scalable
graph databases – visualize, real-time graph querying
– Store massive graphs
Spark-tk components
• Machine Learning & Streaming
– Time series analysis
– Recommender systems
– Topic Modeling
– Clustering
– Classification/Regression
– Image processing
– Scikit Models
– Streaming via Scoring engine
How Spark-tk helps
• Brings in support for image analytics to Spark
– Support for ingesting and processing DICOM images in a
distributed environment
– Queries, filters, and analytics on image collections
– Integrated & distributed dcm4che3 via. Spark
– Tested against pydicom
– Used matplotlib for visualization
• Machine learning for image and health records
– Created distributed Cox Survival Proportional Hazards Model to
compare sets of populations
– Recommend algorithm and hyper-parameters
Live Demo
spark-tensorflow-connector
• Developed library for loading and storing
TensorFlow records with Apache Spark
• Implements data import from the standard
TensorFlow record format (TFRecords) into
Spark SQL DataFrames, and data export from
DataFrames to TensorFlow records
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-
connector
Performance
0
5
10
15
20
25
30
0.25 0.5 1 2 4 8 16 32
Data(GB) Vs. Time(Min)
Processing time increases with data(Constant #Partitions=1000)
Performance
24.5
25
25.5
26
26.5
27
27.5
28
28.5
29
29.5
300 450 675 1012 2278 3147 5125 7688
#Partitions Vs. Time(Min)
Increasing partitions reduced the time and then increased time(overhead)
Performance
0
10
20
30
40
50
60
70
80
90
100
7 11 24 35 56
#Executors Vs. Time(Min)
Increasing number of executors reduces the time significantly
Performance
0
5
10
15
20
25
30
35
1 2 3 4 5
#Executor Cores Vs. Time(Min)
Increasing executor cores did not significantly change time
Thank You.
https://github.com/trustedanalytics/spark-tk
http://trustedanalytics.github.io/spark-tk/

Contenu connexe

Tendances

DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるKohei Tokunaga
 
マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)
マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)
マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)vxsejapan
 
Oracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイド
Oracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイドOracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイド
Oracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイドオラクルエンジニア通信
 
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)NTT DATA Technology & Innovation
 
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...Amazon Web Services
 
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)NTT DATA Technology & Innovation
 
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
20210216 AWS Black Belt Online Seminar AWS Database Migration Service20210216 AWS Black Belt Online Seminar AWS Database Migration Service
20210216 AWS Black Belt Online Seminar AWS Database Migration ServiceAmazon Web Services Japan
 
BuildKitの概要と最近の機能
BuildKitの概要と最近の機能BuildKitの概要と最近の機能
BuildKitの概要と最近の機能Kohei Tokunaga
 
Quarkus による超音速な Spring アプリケーション開発
Quarkus による超音速な Spring アプリケーション開発Quarkus による超音速な Spring アプリケーション開発
Quarkus による超音速な Spring アプリケーション開発Chihiro Ito
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理Amazon Web Services Japan
 
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)NTT DATA Technology & Innovation
 
クリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawa
クリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawaクリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawa
クリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawaShohei Okada
 
組み込みブラウザとメディアﰀ仕様あれこれ
組み込みブラウザとメディアﰀ仕様あれこれ組み込みブラウザとメディアﰀ仕様あれこれ
組み込みブラウザとメディアﰀ仕様あれこれMasashi Umeda
 
PostgreSQLのパスワードの謎を追え!
PostgreSQLのパスワードの謎を追え!PostgreSQLのパスワードの謎を追え!
PostgreSQLのパスワードの謎を追え!Takashi Meguro
 
Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説
Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説
Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説Samir Hammoudi
 
[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇
[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇
[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇Shengyou Fan
 

Tendances (20)

Elastic beanstalk docker_support
Elastic beanstalk docker_supportElastic beanstalk docker_support
Elastic beanstalk docker_support
 
DockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐるDockerとKubernetesをかけめぐる
DockerとKubernetesをかけめぐる
 
マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)
マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)
マルチクラウド環境のデータ保護の現実解(AWS,Azure,GCP-Veritas NetBackup CloudPoint)
 
Oracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイド
Oracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイドOracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイド
Oracle Cloud Platform:IDCSを使ったアイデンティティ・ドメイン管理者ガイド
 
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
PostgreSQLのロール管理とその注意点(Open Source Conference 2022 Online/Osaka 発表資料)
 
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
 
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
速習!論理レプリケーション ~基礎から最新動向まで~(PostgreSQL Conference Japan 2022 発表資料)
 
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
20210216 AWS Black Belt Online Seminar AWS Database Migration Service20210216 AWS Black Belt Online Seminar AWS Database Migration Service
20210216 AWS Black Belt Online Seminar AWS Database Migration Service
 
BuildKitの概要と最近の機能
BuildKitの概要と最近の機能BuildKitの概要と最近の機能
BuildKitの概要と最近の機能
 
Quarkus による超音速な Spring アプリケーション開発
Quarkus による超音速な Spring アプリケーション開発Quarkus による超音速な Spring アプリケーション開発
Quarkus による超音速な Spring アプリケーション開発
 
Oracle Digital Assistantご紹介資料(2020/04)
Oracle Digital Assistantご紹介資料(2020/04)Oracle Digital Assistantご紹介資料(2020/04)
Oracle Digital Assistantご紹介資料(2020/04)
 
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
20210127 AWS Black Belt Online Seminar Amazon Redshift 運用管理
 
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
押さえておきたい、PostgreSQL 13 の新機能!! (PostgreSQL Conference Japan 2020講演資料)
 
EDB Postgres Vision 2019
EDB Postgres Vision 2019 EDB Postgres Vision 2019
EDB Postgres Vision 2019
 
クリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawa
クリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawaクリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawa
クリーンアーキテクチャの考え方にもとづく Laravel との付き合い方 #phpconokinawa
 
クラウド入門(AWS編)
クラウド入門(AWS編)クラウド入門(AWS編)
クラウド入門(AWS編)
 
組み込みブラウザとメディアﰀ仕様あれこれ
組み込みブラウザとメディアﰀ仕様あれこれ組み込みブラウザとメディアﰀ仕様あれこれ
組み込みブラウザとメディアﰀ仕様あれこれ
 
PostgreSQLのパスワードの謎を追え!
PostgreSQLのパスワードの謎を追え!PostgreSQLのパスワードの謎を追え!
PostgreSQLのパスワードの謎を追え!
 
Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説
Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説
Google Container Engine (GKE) & Kubernetes のアーキテクチャ解説
 
[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇
[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇
[JCConf 2022] Compose for Desktop - 開發桌面軟體的新選擇
 

Similaire à A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahita Bhiwandiwalla and Karthik Vadla

AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxDivyaGaurav4
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...Alex Zeltov
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningNEEVEE Technologies
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...Databricks
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Databricks
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Maurice Nsabimana
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpcDr Reeja S R
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 

Similaire à A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahita Bhiwandiwalla and Karthik Vadla (20)

AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptx
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...Im symposium presentation -  OCR and Text analytics for Medical Chart Review ...
Im symposium presentation - OCR and Text analytics for Medical Chart Review ...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 Application and Challenges of Streaming Analytics and Machine Learning on Mu... Application and Challenges of Streaming Analytics and Machine Learning on Mu...
Application and Challenges of Streaming Analytics and Machine Learning on Mu...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Role of python in hpc
Role of python in hpcRole of python in hpc
Role of python in hpc
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 

Dernier (20)

Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 

A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahita Bhiwandiwalla and Karthik Vadla

  • 1. Anahita Bhiwandiwalla – Intel Corporation Karthik Vadla – Intel Corporation A Predictive Analytics Workflow on DICOM Images using Apache Spark
  • 3. Agenda • Use-case overview • Challenges • What is our data? • Deriving Insights • Machine Learning • Tools • Spark-tk • Demo • spark-tensorflow-connector • Performance
  • 4. Use-case overview • Vast amount of medical image data available • Meta-data accompanying the image – Patient information – Status of condition – Method of image capture – Time to events(if part of a study) Create distributed technology needed to efficiently store, scan and analyze data in healthcare.
  • 5. Challenges • Getting data for experimentation is hard! • Combine data from multiple public sources for simulation • Multiple tools – image processing libraries, visualization tools, distributed engines, ML libraries • Compatibility across different tools How do we make Machine Learning easy to use?
  • 6. Data: DICOM - Digital Imaging and COMmunications in Medicine - International standard format to store, exchange, and transmit medical images - Standards for imaging modalities - radiography, ultrasonography, computed tomography (CT), magnetic resonance imaging (MRI), and radiation therapy. - Protocols for image exchange, image compression, 3-D visualization, image presentation etc.
  • 7. Data: Meta-data • Image related: storage, date, relevant parts of the image etc. • Patient related: age, weight, height, DOB, medical history • Device related: manufacturer information, operator, method of capture
  • 8. Deriving insights • Can we develop smart filtering techniques based on image meta-data? • Can we conclude and predict certain conditions based on patient meta-data? • Can we classify image capture techniques and suggest improvements based on device meta-data? • Can we combine image & patient meta-data to conclude and predict? • Can analytics help automate/improve image capture techniques?
  • 9. Machine Learning • Patient data – Train classifier model on patient data to predict presence of a condition • Logistic Regression, Random Forest, Naïve Bayes – Train regressor model on patient data to predict probability of a condition • Linear Regression, Random Forest – Train survival analysis model on patient data to predict time to event, compare risk of populations • Accelerate Failure Time Model, Cox Proportional Hazards Model
  • 10. Machine Learning • Image data – Compute the Eigen decomposition of the image – Compute the Covariance Matrix of the image – Compute the Principal Component Analysis of the image pixel data
  • 11. Machine Learning • Device data: – Predict trends in image capture techniques • Classification/Regression techniques – Predict device time to failure • Cox Proportional Hazards Model, Accelerated Failure Time – Recommendations to device capture methods • Collaborative filtering
  • 12. Tools • pydicom – Python package for working with DICOM files such as medical images, reports, and radiotherapy – Single threaded • dcm4che – Collection of open source applications and utilities for the healthcare enterprise. – Developed in the Java programming language for performance and portability, supporting deployment on JDK 1.6 and up • matplotlib – Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. – Can be used in Python scripts, Python and IPython shell, jupyter notebook, web application servers • Machine Learning libraries – Classifiers, regressors, failure analysis analysis – Eigen decomposition
  • 13. Spark-tk Introduction • Open source library enhancing the Spark experience • APIs for feature engineering, ETL, graph construction, machine learning, image processing, scoring • Abstraction level familiar to data scientists; removes complexity of parallel processing • Lower level Spark APIs seamlessly exposed • Easy to build apps plugging in other tools/libraries https://github.com/trustedanalytics/spark-tk http://trustedanalytics.github.io/spark-tk/
  • 14. Spark-tk components • Frames – Scalable data frame representation – More intuitive than low level HDFS file and Spark RDD/DataFrame/DataSet formats; schema inference – APIs to manipulate the data frames for feature engineering and exploration, such as joins and aggregations – Run user-defined transformations and filters using distributed processing – Input to our models – Easy to convert to/from Spark-tk Frames - Spark representations
  • 15. Spark-tk components • Graphs – Scalable graph representation based on Frames for vertices and edges – In house distributed algorithms for graph analytics using GraphX – Supports importing/exporting to OrientDB’s scalable graph databases – visualize, real-time graph querying – Store massive graphs
  • 16. Spark-tk components • Machine Learning & Streaming – Time series analysis – Recommender systems – Topic Modeling – Clustering – Classification/Regression – Image processing – Scikit Models – Streaming via Scoring engine
  • 17. How Spark-tk helps • Brings in support for image analytics to Spark – Support for ingesting and processing DICOM images in a distributed environment – Queries, filters, and analytics on image collections – Integrated & distributed dcm4che3 via. Spark – Tested against pydicom – Used matplotlib for visualization • Machine learning for image and health records – Created distributed Cox Survival Proportional Hazards Model to compare sets of populations – Recommend algorithm and hyper-parameters
  • 19. spark-tensorflow-connector • Developed library for loading and storing TensorFlow records with Apache Spark • Implements data import from the standard TensorFlow record format (TFRecords) into Spark SQL DataFrames, and data export from DataFrames to TensorFlow records https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow- connector
  • 20. Performance 0 5 10 15 20 25 30 0.25 0.5 1 2 4 8 16 32 Data(GB) Vs. Time(Min) Processing time increases with data(Constant #Partitions=1000)
  • 21. Performance 24.5 25 25.5 26 26.5 27 27.5 28 28.5 29 29.5 300 450 675 1012 2278 3147 5125 7688 #Partitions Vs. Time(Min) Increasing partitions reduced the time and then increased time(overhead)
  • 22. Performance 0 10 20 30 40 50 60 70 80 90 100 7 11 24 35 56 #Executors Vs. Time(Min) Increasing number of executors reduces the time significantly
  • 23. Performance 0 5 10 15 20 25 30 35 1 2 3 4 5 #Executor Cores Vs. Time(Min) Increasing executor cores did not significantly change time