SlideShare une entreprise Scribd logo
1  sur  34
Dr. David Talby
CTO, Pacific AI
BUILD YOUR OWN OPEN SOURCE
DATA SCIENCE PLATFORM
LET’S BUILD A PLATFORM
1. Ground Rules
2. Components
3. Putting It All Together
AT THE BEGINNING, THERE WAS SEARCH
Integrate Data
ETL
Streaming
Quality
Enrichment
Dataflows
Data Analyst Data Scientist
SCOPE
Discover & Visualize
SQL
Search
Visualization
Dashboards
Real-Time Alerts
Train Models
ML, DL, DM, NLP, …
Explore & Visualize
Train & Optimize
Collaboration
Workflows
Productize Models
Deploy API’s
Publish API’s
CI & CD for Models
Measurement
Feedback
App DeveloperData Engineer
Infrastructure
Deployment Orchestration Security Monitoring Single Sign-On Backup Scaling
GOALS
Enterprise Grade
Scales from GB to PB Unified & Modular
Cutting Edge
CONSTRAINTS
No Commercial Software
No Copyleft
No Saas
Built It
LET’S BUILD A PLATFORM
1. Ground Rules
2. Components
3. Putting it All Together
Integrate Data
Data Analyst Data Scientist
SCOPE
Discover & Visualize Train Models Productize Models
App DeveloperData Engineer
Infrastructure
APACHE NIFI
NIFI FEATURES
Web-based dataflow user interface
Seamless experience between design, control, feedback, and monitoring
Highly configurable
Loss tolerant vs guaranteed delivery
Low latency vs high throughput
Dynamic prioritization
Flow can be modified at runtime
Back pressure
Data Provenance
Track dataflow from beginning to end
Designed for extension
Build your own processors and more (120+ available out-of-the-box)
Enables rapid development and effective testing
Secure
SSL, SSH, HTTPS, encrypted content, etc...
Multi-tenant authorization and internal authorization/policy management
APACHE SPARK
Integrate Data
Data Analyst Data Scientist
SCOPE
Discover & Visualize Train Models Productize Models
App DeveloperData Engineer
Infrastructure
SPARK SQL FEATURES
Distributed SQL Engine
Seamless integration with Spark DataFrames
Standards Compliant
ANSI SQL 2003 support
All 99 queries of TPC-DS supported as of Spark 2.0
High performance
New “Catalyst” cost-based optimizer in Spark 2.2
Project Tungsten: “Joining a Billion Rows per Second on a Laptop”
2.5x performance gains between 1.6 and 2.0
Accessible & Extensible
Python, R, Scala, Java, Hive direct API’s + UDF support
KIBANA
TIMELION
KIBANA FEATURES
Full-text and faceted search
Full text query language: Boolean operators, proximity, boosting
Faceted search: Filter by field, value ranges, date ranges, sort, limit, pagination
Time series analysis: aggregates, windowing, offsetting, trending, comparisons
Geospatial search: Search by shape, bounding box, polygon, by distance or range
Visualizations & Dashboards
All the basics: Area, pie, bar, heatmap, table, metric, map, scatter, timeline, tile
Drag & drop creation and editing
Organize visualizations into dashboards
Dashboards can be dynamically filtered by time, queries, filters
Publish, embed and share dashboards
Real-time updates
Performant
Fast interactive queries, faceting and filtering
REST API and clients in all major languages
Integrate Data
Data Analyst Data Scientist
SCOPE
Discover & Visualize Train Models Productize Models
App DeveloperData Engineer
Infrastructure
JUPYTER LAB
JUPYTER HUB
ANACONDA
Integrate Data
Data Analyst Data Scientist
SCOPE
Discover & Visualize Train Models Productize Models
App DeveloperData Engineer
Infrastructure
OPEN SCORING
MLEAP
KONG API GATEWAY
API Gateway on nginx
Scalable
Modular with plugins
Authentication
Basic Auth, Open ID,
OAuth, HMAC, LDAP, JWT
Security
ACL, CORS, IP Restriction,
Bot Detection, SSL
Traffic Control
Proxy Caching, Rate limit,
Size limits, terminations
Logging & Analytics
Galileo, Datadog, Runscope
TCP, HTTP, File, Syslog, StatsD
COLLABORATION, CI & CD
Plan
Projects, Boards, Issues,
Milestones, Teams
Create
Merge, Preview, Commit,
Branch, Lock, Discuss
Verify
Automated pipelines,
graphs, history, scaling
Package
Built-in container registry
Release
Continuous integration &
continuous deployment
Configure & Monitor
Infrastructure
Integrate Data
Data Analyst Data Scientist
SCOPE
Discover & Visualize Train Models Productize Models
App DeveloperData Engineer
KUBERNETES
Portable Containers
Public, Private, Hybrid,
or Multi-Cloud
Deployment
Automation, Co-Location,
Storage Mounting, Secrets
Auto-*
-Scaling, -Healing, -Restart,
-Placement, -Replication
Rolling Updates
Load Balancing
Service Discovery
Monitoring Resources
Accessing & Ingesting Logs
PROMETHEUS & GRAFANA
KEYCLOAK
LET’S BUILD A PLATFORM
1. Ground Rules
2. Components
3. Putting It All Together
The Big Picture
• This is a complex, major enterprise platform
• It’s far from free: Cost is in integration, training & ops
• Why open source?
1. Often, outright better technology
2. Faster innovation
3. More native integrations
4. More books, talks, tutorials, posts & answers
5. Cheaper, both to begin and to scale
Common Questions
Q: Do I need it all on Day One?
A: No. Use what you need, know where it fits later.
Q: What if I already have another tool in place?
A: Keep it. Architecture is about incremental evolution.
Q: What if I don’t have the in-house knowledge?
A: Outsource, but require training & onboarding.
Q: What often gets overlooked?
A: Keeping components continuously up to date.
Summary: If you remember one thing…
Build the simplest platform that serves
everyone required to turn science into $$$
Data Analyst Data Scientist App DeveloperData Engineer
david@pacific.ai
@davidtalby
in/davidtalby
THANK YOU!

Contenu connexe

Tendances

Data Day TX 2016 - Jan 16, 2016
Data Day TX 2016 - Jan 16, 2016Data Day TX 2016 - Jan 16, 2016
Data Day TX 2016 - Jan 16, 2016Michelle Casbon
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016Sri Ambati
 
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...Sri Ambati
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireDatabricks
 
Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...
Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...
Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...Amazon Web Services
 
[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft AzureKorkrid Akepanidtaworn
 
Big data and AI in Socialbakers
Big data and AI in SocialbakersBig data and AI in Socialbakers
Big data and AI in Socialbakersppetr82
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkDatabricks
 
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba... DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...Databricks
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsEric Chiang
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Sri Ambati
 
How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...
How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...
How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...Channy Yun
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsGianmario Spacagna
 

Tendances (20)

Data Day TX 2016 - Jan 16, 2016
Data Day TX 2016 - Jan 16, 2016Data Day TX 2016 - Jan 16, 2016
Data Day TX 2016 - Jan 16, 2016
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016Introduction to Sparkling Water - Spark Summit East 2016
Introduction to Sparkling Water - Spark Summit East 2016
 
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
 
Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...
Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...
Deep Learning for Data Scientists: Using Apache MXNet and R on AWS - July 201...
 
[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Big data and AI in Socialbakers
Big data and AI in SocialbakersBig data and AI in Socialbakers
Big data and AI in Socialbakers
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba... DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
 
How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...
How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...
How to Measure DevRel's Perfomances: From Community to Business - Channy Yun ...
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
 

Similaire à Build your open source data science platform

Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resumevenkata sateeshs
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowEric Kavanagh
 
Devops Powered by Splunk
Devops Powered by SplunkDevops Powered by Splunk
Devops Powered by SplunkSplunk
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Trivandrumtechcon20
Trivandrumtechcon20Trivandrumtechcon20
Trivandrumtechcon20Jenkins NS
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
 
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...Splunk
 
Azure DevOps Best Practices Webinar
Azure DevOps Best Practices WebinarAzure DevOps Best Practices Webinar
Azure DevOps Best Practices WebinarCambay Digital
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData Inc.
 
Oracle BI 11g Insync presentation
Oracle BI 11g Insync presentationOracle BI 11g Insync presentation
Oracle BI 11g Insync presentationInSync Conference
 
Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?Apigee | Google Cloud
 
OCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes LaunchOCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes LaunchPT Datacomm Diangraha
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...WikibonCommunity
 

Similaire à Build your open source data science platform (20)

resume4
resume4resume4
resume4
 
Venkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-ResumeVenkata Sateesh_BigData_Latest-Resume
Venkata Sateesh_BigData_Latest-Resume
 
Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
 
Pavani_Rao
Pavani_RaoPavani_Rao
Pavani_Rao
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data Now
 
Devops Powered by Splunk
Devops Powered by SplunkDevops Powered by Splunk
Devops Powered by Splunk
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Trivandrumtechcon20
Trivandrumtechcon20Trivandrumtechcon20
Trivandrumtechcon20
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
 
Azure DevOps Best Practices Webinar
Azure DevOps Best Practices WebinarAzure DevOps Best Practices Webinar
Azure DevOps Best Practices Webinar
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
 
Oracle BI 11g Insync presentation
Oracle BI 11g Insync presentationOracle BI 11g Insync presentation
Oracle BI 11g Insync presentation
 
GOTO Berlin 2016
GOTO Berlin 2016GOTO Berlin 2016
GOTO Berlin 2016
 
Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?Which Application Modernization Pattern Is Right For You?
Which Application Modernization Pattern Is Right For You?
 
OCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes LaunchOCP Datacomm RedHat - Kubernetes Launch
OCP Datacomm RedHat - Kubernetes Launch
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
Systems of Intelligence: The Biggest Change in Enterprise Applications in 50 ...
 

Plus de David Talby

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...David Talby
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldDavid Talby
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsDavid Talby
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022David Talby
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021David Talby
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...David Talby
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in HealthcareDavid Talby
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understandingDavid Talby
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...David Talby
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...David Talby
 

Plus de David Talby (11)

Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...Building State-of-the-art Natural Language Processing Projects with Free Soft...
Building State-of-the-art Natural Language Processing Projects with Free Soft...
 
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st WorldTurning Medical Expert Knowledge into Responsible Language Models - K1st World
Turning Medical Expert Knowledge into Responsible Language Models - K1st World
 
How to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical TrialsHow to Apply NLP to Analyze Clinical Trials
How to Apply NLP to Analyze Clinical Trials
 
New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022New Frontiers in Applied NLP​ - PAW Healthcare 2022
New Frontiers in Applied NLP​ - PAW Healthcare 2022
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021Applying NLP to Personalized Healthcare - 2021
Applying NLP to Personalized Healthcare - 2021
 
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
Introducing the Open-Source Library for Testing NLP Models - Healthcare NLP S...
 
Natural Language Understanding in Healthcare
Natural Language Understanding in HealthcareNatural Language Understanding in Healthcare
Natural Language Understanding in Healthcare
 
Deep learning for natural language understanding
Deep learning for natural language understandingDeep learning for natural language understanding
Deep learning for natural language understanding
 
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...Natural Language Understanding with Machine Learned Annotators and Deep Learn...
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
 
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
Semantic Natural Language Understanding with Spark, UIMA & Machine Learned On...
 

Dernier

The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 

Dernier (20)

The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 

Build your open source data science platform

  • 1. Dr. David Talby CTO, Pacific AI BUILD YOUR OWN OPEN SOURCE DATA SCIENCE PLATFORM
  • 2. LET’S BUILD A PLATFORM 1. Ground Rules 2. Components 3. Putting It All Together
  • 3. AT THE BEGINNING, THERE WAS SEARCH
  • 4. Integrate Data ETL Streaming Quality Enrichment Dataflows Data Analyst Data Scientist SCOPE Discover & Visualize SQL Search Visualization Dashboards Real-Time Alerts Train Models ML, DL, DM, NLP, … Explore & Visualize Train & Optimize Collaboration Workflows Productize Models Deploy API’s Publish API’s CI & CD for Models Measurement Feedback App DeveloperData Engineer Infrastructure Deployment Orchestration Security Monitoring Single Sign-On Backup Scaling
  • 5. GOALS Enterprise Grade Scales from GB to PB Unified & Modular Cutting Edge
  • 6. CONSTRAINTS No Commercial Software No Copyleft No Saas Built It
  • 7. LET’S BUILD A PLATFORM 1. Ground Rules 2. Components 3. Putting it All Together
  • 8. Integrate Data Data Analyst Data Scientist SCOPE Discover & Visualize Train Models Productize Models App DeveloperData Engineer Infrastructure
  • 10. NIFI FEATURES Web-based dataflow user interface Seamless experience between design, control, feedback, and monitoring Highly configurable Loss tolerant vs guaranteed delivery Low latency vs high throughput Dynamic prioritization Flow can be modified at runtime Back pressure Data Provenance Track dataflow from beginning to end Designed for extension Build your own processors and more (120+ available out-of-the-box) Enables rapid development and effective testing Secure SSL, SSH, HTTPS, encrypted content, etc... Multi-tenant authorization and internal authorization/policy management
  • 12. Integrate Data Data Analyst Data Scientist SCOPE Discover & Visualize Train Models Productize Models App DeveloperData Engineer Infrastructure
  • 13. SPARK SQL FEATURES Distributed SQL Engine Seamless integration with Spark DataFrames Standards Compliant ANSI SQL 2003 support All 99 queries of TPC-DS supported as of Spark 2.0 High performance New “Catalyst” cost-based optimizer in Spark 2.2 Project Tungsten: “Joining a Billion Rows per Second on a Laptop” 2.5x performance gains between 1.6 and 2.0 Accessible & Extensible Python, R, Scala, Java, Hive direct API’s + UDF support
  • 16. KIBANA FEATURES Full-text and faceted search Full text query language: Boolean operators, proximity, boosting Faceted search: Filter by field, value ranges, date ranges, sort, limit, pagination Time series analysis: aggregates, windowing, offsetting, trending, comparisons Geospatial search: Search by shape, bounding box, polygon, by distance or range Visualizations & Dashboards All the basics: Area, pie, bar, heatmap, table, metric, map, scatter, timeline, tile Drag & drop creation and editing Organize visualizations into dashboards Dashboards can be dynamically filtered by time, queries, filters Publish, embed and share dashboards Real-time updates Performant Fast interactive queries, faceting and filtering REST API and clients in all major languages
  • 17. Integrate Data Data Analyst Data Scientist SCOPE Discover & Visualize Train Models Productize Models App DeveloperData Engineer Infrastructure
  • 21. Integrate Data Data Analyst Data Scientist SCOPE Discover & Visualize Train Models Productize Models App DeveloperData Engineer Infrastructure
  • 23. MLEAP
  • 24. KONG API GATEWAY API Gateway on nginx Scalable Modular with plugins Authentication Basic Auth, Open ID, OAuth, HMAC, LDAP, JWT Security ACL, CORS, IP Restriction, Bot Detection, SSL Traffic Control Proxy Caching, Rate limit, Size limits, terminations Logging & Analytics Galileo, Datadog, Runscope TCP, HTTP, File, Syslog, StatsD
  • 25. COLLABORATION, CI & CD Plan Projects, Boards, Issues, Milestones, Teams Create Merge, Preview, Commit, Branch, Lock, Discuss Verify Automated pipelines, graphs, history, scaling Package Built-in container registry Release Continuous integration & continuous deployment Configure & Monitor
  • 26. Infrastructure Integrate Data Data Analyst Data Scientist SCOPE Discover & Visualize Train Models Productize Models App DeveloperData Engineer
  • 27. KUBERNETES Portable Containers Public, Private, Hybrid, or Multi-Cloud Deployment Automation, Co-Location, Storage Mounting, Secrets Auto-* -Scaling, -Healing, -Restart, -Placement, -Replication Rolling Updates Load Balancing Service Discovery Monitoring Resources Accessing & Ingesting Logs
  • 30. LET’S BUILD A PLATFORM 1. Ground Rules 2. Components 3. Putting It All Together
  • 31. The Big Picture • This is a complex, major enterprise platform • It’s far from free: Cost is in integration, training & ops • Why open source? 1. Often, outright better technology 2. Faster innovation 3. More native integrations 4. More books, talks, tutorials, posts & answers 5. Cheaper, both to begin and to scale
  • 32. Common Questions Q: Do I need it all on Day One? A: No. Use what you need, know where it fits later. Q: What if I already have another tool in place? A: Keep it. Architecture is about incremental evolution. Q: What if I don’t have the in-house knowledge? A: Outsource, but require training & onboarding. Q: What often gets overlooked? A: Keeping components continuously up to date.
  • 33. Summary: If you remember one thing… Build the simplest platform that serves everyone required to turn science into $$$ Data Analyst Data Scientist App DeveloperData Engineer

Notes de l'éditeur

  1. You need a platform if you’re building a set of solutions in the data science space.
  2. We’re going open source to get a better solution, not just to save money.
  3. The constraints are not meant to say that other types of solutions are worth the money – they often are, but starting with these constraints gives you a baseline of expectations.