This document discusses how Spark can be used for production scale applications. It provides examples of companies using Spark and MapR in production for tasks like security analytics, genomics research, and customer analytics. It also outlines key issues to consider when taking Spark to production and describes how MapR provides the performance, reliability, support and data services needed for mission critical Spark applications.
6. Delivers Lightning Fast
Analytics for Clients
Building largest Hadoop cluster
in Australia
Real-time analytics using Spark on
MapR–reducing data loading time
from hours to minutes
Leverage multi-tenancy,
high-performance and reliability
of MapR
http://www.thinkstockphotos.com/image/stock-photo-scientist-holding-test-tube/461208473/popup?sq=drug%20discovery/f=CPIHVX/s=Popularity
a. Interested in Adam project - runs on top of Spark - for nextgen genomics - good whiteppaer - search for APche spark Adam
Git hub for Adam - Notes - links to the whitepaper - AMPLab -
http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-207.html
Genomics realignment tool - crux - next ten gemonic medicine - allows you to much more quickly access and manage the alignment of data.
b. Existing genomics pipeline - many weeks to realign - drill down
augment and working through their chemical compounds - genetics can come and test their compound against the alignment
c. Not the exact sections of then they cannot focus - back out and zoom back in on the right set of sequence.
that process 6 weeks - geneticist - 1 day and shift it - 6 weeks to get to the change -
Adam in a matter of hours - geneticists can do it themselves - whole team of HPC experts otherwise
d. this is the case for almost all pharma companies - Novartis is also the same
e. One tool in a bigger framework - several other use cases as well.
Objectives:
Razorsight's cloud-based predictive analytics software delivers insights to help communications service providers (CSPs) and media companies optimize operations and offer superior customer experiences.
“As we grew as a company and big data evolved, there was a lot more data available,” explains Razorsight CTO Suren Nathan. “Today’s data has higher volumes and different structures. There are new types of devices generating data for the Internet of Things, mobile phones using broadband for apps, and VoIP.”
Challenges
Razorsight’s old technology platform could not keep up with the demands and opportunities of this increasing volume of data. Storage costs were exploding and they wanted to be able to maintain performance and scalability at a lower cost. The prior platform was based on IBM Pure Data (Netezza) as a data warehouse appliance and couldn’t support high-speed data ingestion and reporting. Additionally it required having a separate database for each customer.
Solution
MapR came out on top for several reasons and is being used as the primary data store for online and archive data
Having the flexibility of the full Spark stack as part of the Hadoop distribution was very important. Spark helps transition a large part of ETL processing
2. MapR provided production-class Hadoop with enterprise support
3. The NFS gateway was critical for easy, high-speed access.
Business Impact
Customers such as Virgin Mobile LA are reducing churn by tailoring campaigns to the right subscriber at the right time. Much lower TCO compared to IBM Netezza ( 1/8th the cost). Razorsight is able to also reduce on-boarding time for new customers from 4-6 months to 8-12 weeks.
Spark execution engine rides on the Mesos framework and any distributed file system (in this case HDFS)