SlideShare a Scribd company logo
1 of 12
Lambda Architecture
Use Case: Mayo Clinic
FEBRUARY 2015
Altan Khendup – Leader, UDA Architecture COE
2
Background of Lambda Architecture
Background
– Reference architecture for Big Data systems
– Designed by Nathan Marz (Twitter)
– Defined as a system that runs arbitrary functions on arbitrary
data
– “query = function(all data)”
Design Principles
– Human fault-tolerant, Immutability, Computable
Lambda Layers
– Batch - Contains the immutable, constantly growing master
dataset.
– Speed - Deals only with new data and compensates for the
high latency updates of the serving layer.
– Serving - Loads and exposes the combined view of data so
that they can be queried.
3
Overview of Lambda Architecture
4 © 2014 Teradata
USE CASE – MAYO CLINIC
Mayo Clinic History
Every year, more than a million people from all 50 states
and nearly 150 countries come for care
Dozens of locations in several states with major
campuses in Rochester, Minn.; Scottsdale and Phoenix,
Ariz.; and Jacksonville, Fla.
Mayo Clinic Rochester, Minn. recognized as the top
hospital in the nation for 2014-2015 by U.S. News &
World Report
Why Big Data?
Challenges in Medical Data
Health data tends to be “wide”, not “deep”
New data types are becoming more important
Unstructured
Real-time streaming
A challenge to generally move from retrospective “BI”
viewing to event-based and predictive analytics usage
Multiple layers
Lots of events, data
Complex
Lots of different languages and data structures
Difficult to maintain
Lots of moving pieces/components/technologies
Lots of changes in the business
Data Discovery
Many “Big Data” stories start with data discovery
The Data Lake, etc.
But, data discovery is not predictable!
Mayo Clinic needed to define a real operational need
that a “Big Data” technology stack could fulfill
Project
Optimize an existing Natural Language Processing
pipeline in support of critical Colorectal Surgery
(Move to tens of thousands of documents processed)
Replace an existing free-text search facility used by
Clinical Web Service for colorectal cancer
(Move search to milliseconds)
9
Overall Architecture
10
• Current Storm throughput up to 1.5 million documents per hour
• Average of 140,000 HL7 messages actually processed per day with
average latency of 60 milliseconds from ingest to persistence
• Average of 50,000 documents passed through annotators per day
versus 5,000 historically
• Actual annotations of documents up to 6 times faster than previously
accomplished
• Free-text search use cases that took over 30 minutes on old
infrastructure completing in milliseconds in ElasticSearch
Operational Statistics
11
• Benefits
– An architecturally-driven, internally-owned technology stack that blends:
- An event-based/”real-time” processing fabric
- A multi-destination distillation hub
- A foundation for “Classic” BI delivery techniques
- A foundation for “Services-based” delivery techniques
- A “serendipitous” discovery environment
– Mutually supportive components that combine in delivering novel clinical
solutions
– Data continuity
- Historical data can be assessed as algorithms change over time
Summary
12
Thank you! We’re Hiring!
thinkbigcareers.teradata.com
Altan Khendup (@madmongol)
Altan.khendup@teradata.com
Ron Bodkin (@ronbodkin)
Ron.bodkin@thinkbiganalytics.com

More Related Content

What's hot

Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperWilliam Gunn
 
From Data to Visualization: Emerging Tools for Research / Jan Johansson
From Data to Visualization: Emerging Tools for Research / Jan JohanssonFrom Data to Visualization: Emerging Tools for Research / Jan Johansson
From Data to Visualization: Emerging Tools for Research / Jan JohanssonPVC.ASIST
 
Machine Learning and Cultural Heritage: What Is It Good Enough For?
Machine Learning and Cultural Heritage: What Is It Good Enough For?Machine Learning and Cultural Heritage: What Is It Good Enough For?
Machine Learning and Cultural Heritage: What Is It Good Enough For?John Stack
 
Data Gov
Data GovData Gov
Data GovRexNige
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansBram van den Hout
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstrationCLARIAH
 
Closing Keynote - AWS Executive Summit 2014 India
Closing Keynote - AWS Executive Summit 2014 IndiaClosing Keynote - AWS Executive Summit 2014 India
Closing Keynote - AWS Executive Summit 2014 IndiaAmazon Web Services
 
RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015Repository Fringe
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
Adding value to researchers' data
Adding value to researchers' dataAdding value to researchers' data
Adding value to researchers' dataAndrew Treloar
 
Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Repository Fringe
 
Husar Report to UIC Plenary
Husar Report to UIC PlenaryHusar Report to UIC Plenary
Husar Report to UIC PlenaryRudolf Husar
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 

What's hot (20)

Mendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 PaperMendeley Open Repositories 2011 Paper
Mendeley Open Repositories 2011 Paper
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
 
From Data to Visualization: Emerging Tools for Research / Jan Johansson
From Data to Visualization: Emerging Tools for Research / Jan JohanssonFrom Data to Visualization: Emerging Tools for Research / Jan Johansson
From Data to Visualization: Emerging Tools for Research / Jan Johansson
 
Top 10 data science technologies
Top 10 data science technologiesTop 10 data science technologies
Top 10 data science technologies
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
 
Machine Learning and Cultural Heritage: What Is It Good Enough For?
Machine Learning and Cultural Heritage: What Is It Good Enough For?Machine Learning and Cultural Heritage: What Is It Good Enough For?
Machine Learning and Cultural Heritage: What Is It Good Enough For?
 
Data Gov
Data GovData Gov
Data Gov
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historians
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstration
 
Closing Keynote - AWS Executive Summit 2014 India
Closing Keynote - AWS Executive Summit 2014 IndiaClosing Keynote - AWS Executive Summit 2014 India
Closing Keynote - AWS Executive Summit 2014 India
 
RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015RSpace - Rory Macneil at Repository Fringe 2015
RSpace - Rory Macneil at Repository Fringe 2015
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
Adding value to researchers' data
Adding value to researchers' dataAdding value to researchers' data
Adding value to researchers' data
 
Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...Building data networks: exploring trust and interoperability between authoris...
Building data networks: exploring trust and interoperability between authoris...
 
Big data
Big dataBig data
Big data
 
Husar Report to UIC Plenary
Husar Report to UIC PlenaryHusar Report to UIC Plenary
Husar Report to UIC Plenary
 
Open Data from the European Library
Open Data from the European LibraryOpen Data from the European Library
Open Data from the European Library
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 

Viewers also liked

Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingDataWorks Summit
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureEsh Vckay
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNSAWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNSAmazon Web Services Japan
 
AWS Black Belt Techシリーズ Amazon SNS / Amazon SQS
AWS Black Belt Techシリーズ Amazon SNS / Amazon SQSAWS Black Belt Techシリーズ Amazon SNS / Amazon SQS
AWS Black Belt Techシリーズ Amazon SNS / Amazon SQSAmazon Web Services Japan
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution OptionsSolution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution OptionsAlan McSweeney
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution ArchitectureAlan McSweeney
 

Viewers also liked (12)

Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda Architecture
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNSAWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
AWS Black Belt Tech シリーズ 2016 - Amazon SQS / Amazon SNS
 
AWS Black Belt Techシリーズ Amazon SNS / Amazon SQS
AWS Black Belt Techシリーズ Amazon SNS / Amazon SQSAWS Black Belt Techシリーズ Amazon SNS / Amazon SQS
AWS Black Belt Techシリーズ Amazon SNS / Amazon SQS
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution OptionsSolution Architecture – Approach to Rapidly Scoping The Initial Solution Options
Solution Architecture – Approach to Rapidly Scoping The Initial Solution Options
 
Structured Approach to Solution Architecture
Structured Approach to Solution ArchitectureStructured Approach to Solution Architecture
Structured Approach to Solution Architecture
 

Similar to Lambda Architecture The Hive

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHortonworks
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Met soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharingMet soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharingPhilippe Rocca-Serra
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationUsing The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationDan Wellisch
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Instrument of Change: Creating the next generation of Laboratory Middleware
Instrument of Change: Creating the next generation of Laboratory MiddlewareInstrument of Change: Creating the next generation of Laboratory Middleware
Instrument of Change: Creating the next generation of Laboratory MiddlewareTodd Winey
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
 
High Performance Computing and the Opportunity with Cognitive Technology
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive TechnologyIBM Watson
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Ola Spjuth
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Data
 
Big Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data StaxBig Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data StaxMikan Associates
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centreJisc
 

Similar to Lambda Architecture The Hive (20)

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
HPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare TransformationHPE and Hortonworks join forces to Deliver Healthcare Transformation
HPE and Hortonworks join forces to Deliver Healthcare Transformation
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Met soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharingMet soc15 roccaserra-biocrates-datasharing
Met soc15 roccaserra-biocrates-datasharing
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Big Data
Big Data Big Data
Big Data
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationUsing The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare Innovation
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Instrument of Change: Creating the next generation of Laboratory Middleware
Instrument of Change: Creating the next generation of Laboratory MiddlewareInstrument of Change: Creating the next generation of Laboratory Middleware
Instrument of Change: Creating the next generation of Laboratory Middleware
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
 
High Performance Computing and the Opportunity with Cognitive Technology
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive Technology
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Yellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time AnalyticsYellowbrick Webcast with DBTA for Real-Time Analytics
Yellowbrick Webcast with DBTA for Real-Time Analytics
 
Big Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data StaxBig Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data Stax
 
Jisc's new shared data centre
Jisc's new shared data centreJisc's new shared data centre
Jisc's new shared data centre
 

Lambda Architecture The Hive

  • 1. Lambda Architecture Use Case: Mayo Clinic FEBRUARY 2015 Altan Khendup – Leader, UDA Architecture COE
  • 2. 2 Background of Lambda Architecture Background – Reference architecture for Big Data systems – Designed by Nathan Marz (Twitter) – Defined as a system that runs arbitrary functions on arbitrary data – “query = function(all data)” Design Principles – Human fault-tolerant, Immutability, Computable Lambda Layers – Batch - Contains the immutable, constantly growing master dataset. – Speed - Deals only with new data and compensates for the high latency updates of the serving layer. – Serving - Loads and exposes the combined view of data so that they can be queried.
  • 3. 3 Overview of Lambda Architecture
  • 4. 4 © 2014 Teradata USE CASE – MAYO CLINIC
  • 5. Mayo Clinic History Every year, more than a million people from all 50 states and nearly 150 countries come for care Dozens of locations in several states with major campuses in Rochester, Minn.; Scottsdale and Phoenix, Ariz.; and Jacksonville, Fla. Mayo Clinic Rochester, Minn. recognized as the top hospital in the nation for 2014-2015 by U.S. News & World Report
  • 6. Why Big Data? Challenges in Medical Data Health data tends to be “wide”, not “deep” New data types are becoming more important Unstructured Real-time streaming A challenge to generally move from retrospective “BI” viewing to event-based and predictive analytics usage Multiple layers Lots of events, data Complex Lots of different languages and data structures Difficult to maintain Lots of moving pieces/components/technologies Lots of changes in the business
  • 7. Data Discovery Many “Big Data” stories start with data discovery The Data Lake, etc. But, data discovery is not predictable! Mayo Clinic needed to define a real operational need that a “Big Data” technology stack could fulfill
  • 8. Project Optimize an existing Natural Language Processing pipeline in support of critical Colorectal Surgery (Move to tens of thousands of documents processed) Replace an existing free-text search facility used by Clinical Web Service for colorectal cancer (Move search to milliseconds)
  • 10. 10 • Current Storm throughput up to 1.5 million documents per hour • Average of 140,000 HL7 messages actually processed per day with average latency of 60 milliseconds from ingest to persistence • Average of 50,000 documents passed through annotators per day versus 5,000 historically • Actual annotations of documents up to 6 times faster than previously accomplished • Free-text search use cases that took over 30 minutes on old infrastructure completing in milliseconds in ElasticSearch Operational Statistics
  • 11. 11 • Benefits – An architecturally-driven, internally-owned technology stack that blends: - An event-based/”real-time” processing fabric - A multi-destination distillation hub - A foundation for “Classic” BI delivery techniques - A foundation for “Services-based” delivery techniques - A “serendipitous” discovery environment – Mutually supportive components that combine in delivering novel clinical solutions – Data continuity - Historical data can be assessed as algorithms change over time Summary
  • 12. 12 Thank you! We’re Hiring! thinkbigcareers.teradata.com Altan Khendup (@madmongol) Altan.khendup@teradata.com Ron Bodkin (@ronbodkin) Ron.bodkin@thinkbiganalytics.com

Editor's Notes

  1. Lambda = architectural pattern to talk about the complexity of dealing with real-time and historical datasets Overall use Prescriptive/Predictive uses rely on some dimension of real-time Use cases CPG – consumer goods looking at what customers are doing in real-time and making adjustments Medical – real-time medical sensors and treatment and labs for critical patient care Financial – credit risk and transaction fraud Manufacturers – IoT/Telematics getting information from their plants and logistics, cross referencing to inventory, and making adjustments to supply chain
  2. General architecture that covers how Lambda works overall Able to address real-time and historical data Layers Speed – real-time/current data streams; spark, storm, etc. Batch – historical data layer Serving – ability to take the current data and historical and merge the results and provide that to the organization Real-world experience/strategy Do not tackle all of the data but rather necessary segments of business functionality called queries Data can be tackled per query hence the idea of “query focused datasets” or qfds Allows for more focused results/faster speed gains
  3. Data Discovery & Analytics
  4. HL7 actual processing based on “pull” requests from users not actual processing power HL7 are large xml-based documents Much larger than say JSON or others (roughly 800k-900k in size) Contains significant data related to medical information End goal An architecturally-driven, internally-owned technology stack that blends: An event-based processing fabric A real-time processing framework A multi-destination distillation hub “Classic” BI delivery techniques “Services-based” delivery techniques A “serendipitous” discovery environment Mutually supportive components that combine in delivering novel clinical solutions.
  5. Challenges Cross industry similarities
  6. We’re hiring! Ron Bodkin + my contact