SlideShare a Scribd company logo
1 of 17
Download to read offline
High-Performance Computing
Dell Intel EE Lustre Storage
Melbourne Big Data User Group
January 2016
Andrew Underwood
Lead HPC Technologist
The future of data is BIG, and HPC is needed to power it
The digital universe is growing 40%* a year into the next decade…
2016 --------------------------------------------------- 2020
~90%
of the worlds data
has been created
in the last 2 years
* Source: EMC Digital Universe with Research & Analysis by IDC
44ZB
of data will exist
in the digital universe
by the year 2020
~37%
of the data generated
in 2020 will be used for
analysis and processing
The convergence of HPC and Big Data are driving change
Scalable performance and massive capacity
Stable, predictable and reliable
Balanced configuration, designed for parallel input-output
Support compute intensive, and data intensive ‘Big Data’ workloads with Hadoop
1
2
3
4
Enterprise grade technology with 24/7 access to data5
Introducing our latest generation Dell EE Lustre Storage
Limitless
Endless Scalability
11GB/s
Peak Read
per building block
Up to 4PB
raw capacity
per rack
Up to 44GB/s
throughput
per rack
The Ultimate HPC File SystemParallel
For ultimate scale-out
Hadoop
Converged Platform
7GB/s
Peak Write
per building block
What is Lustre?
Designed for maximum performance and scalability….
• Open-Source parallel file system built on open standards hardware
• Global, shared name space – accessible by over 25,000+ clients
• Object based file system with distributed file stripe across storage targets
• Highly available design, with no single point of failure
• Scalable beyond an Exabyte in capacity, and over a Petabyte of sustained throughput
• Accessible by clients over network (Ethernet, InfiniBand, Omni-Path Architecture)
The Intel Enterprise Edition of Lustre
Inside the Lustre File System
Ethernet Management
Network
High Performance Data Network
(InfiniBand, Ethernet, Omni-Path)
Metadata
Servers
Object Storage
Servers
Intel Manager for Lustre
Lustre Clients (1 – 100,000+)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
Storage servers grouped
into failover pairs
A scalable building block design
Designed to scale to the Exabyte, with a Petabyte of throughput
Solution benefits & Dell differentiation
• File system based on Intel Enterprise Edition for Lustre
• Single file system namespace scalable to high capacities
and performance
• Engineered by Dell HPC Engineering to provide optimal
performance on Dell hardware platform
• Design providing maximum throughput per building
block with on-the-fly storage expansion
• Solution design for Big Data workloads using Intel
Hadoop Adapter for Lustre (HAL)
• Share data with other file systems utilizing optional
NFS/CIFS gateway
• Dell Networking 10/40GbE, InfiniBand, or Omni-Path
How Lustre Works
Basic File Storage Principles of the Lustre File System
The file system consists of many object storage targets (OST) which are presented as a single unified file system.
IO can be increased in most cases by using file striping. File striping divides data into chunks that are distributed
across OSTs within the file system.
Lustre Write File Data Flow
1) Client requests to write a file to the file system
2) Client contacts the MDS with a write request
3) MDS checks the user authentication and intended
location of the file
4) MDS responds to the Client with a list of OSTs that the
Client can write the file to
5) Client receives the response and writes to the assigned
OSTs without further communication with the MDS
Lustre Read File Data Flow
1) Client requests to read a file from the file system
2) Client contacts the MDS with a read request
3) MDS checks the user authentication and file location
4) MDS responds to the Client with a list of OSTs that the
stripes of the file are located
5) Client receives the response and reads the data from
the OSTs without further communication with the MDS
Connectivity and HSM
Lustre design is performance centric
CIFS Access
As over 99% of the worlds HPC and Supercomputing environments are
built on Linux, Lustre is designed to be accessible by clients running
RHEL/CentOS/SUSE Linux operating systems.
Access can still be provided via CIFS using Dell PowerEdge R630 clients
running SAMBA gateways.
Hierarchical Storage Management (HSM)
HSM can be configured and managed via the Intel Manager for Lustre
(IML), and provides a reliable mechanism for archiving data onto tiers of
secondary, high-capacity, affordable storage.
The Intel Lustre HSM framework uses the Linux copytool and Robinhood
Where performance meets scalability
• Peak Write = 7GB/s
• Peak Read = 11GB/s
• Peak Write = 12.6K IOPS
• Peak Read = 96K IOPS
Sequential IO Performance – 1 MB Stripe Random IO Performance – 4 MB Stripe
Increase metadata performance with Luster Distributed
Namespace feature (Lustre DNE)
• Lustre DNE Phase 1 is the
designation for Luster DNE
Remote Directories.
• Lustre sub-directories can
now be distributed across
multiple MDTs to increase
metadata capacity
capabilities and
performance.
Configure, optimize and manage using Intel Manager for Lustre
IML simplifies your management workflow:
• UI driven configuration, monitoring, and
overall management lowers complexity
and cost
• Advanced charting options illustrate
storage performance in near-real time
• Automated configuration of storage
servers pairs for increased high-availability
• Configure and manage power distribution
units for automated fail-over
• Smart, intuitive alerts and logs help storage
administrators monitor storage
performance
Intel Manager for Lustre (IML) software dashboard
The ‘dashboard’ canvas displays a variety of dynamic charts
illustrating performance levels and resource utilization.
Administrators can easily view file systems, check resource
consumption for jobs, and monitor performance.
In depth storage hardware reporting is possible when combined
with optional hardware vendor provided plug-ins.
System status indictor
provides the status for all
managed file systems. Click
to go to detailed
information.
Easily configure servers,
volumes and power
controls. Optionally, enable
HSM per file system
Intelligent, intuitive log files
– quickly understand how
your storage is performing
Where Big Data meets high-performance computing
As data sets expand, the infrastructure that supports them needs to be faster, bigger, more scalable
Hadoop over Lustre (HAL)
Intel EE Lustre has plugins for Apache Hadoop and Cloudera Distribution of Hadoop, which require no changes to the
Lustre architecture and allows for big data workloads to take advantage of high-performance infrastructure, by replacing
the HDFS portion of Hadoop with the Lustre file system.
This is driving a new market for High-Performance Data Analytics, where HPC boosts performance of Big Data workloads.
Measuring the performance of Hadoop running on Lustre – Check out this video of Dell deploying Hadoop on Lustre in production
* Source: Tests are from Intel and Tata Consultancy Services Lustre Big Data White Paper (Click Here)
Start by making your data future ready!
Contact your local Dell Account Executive for a free HPC workshop, and we can help identify
bottlenecks, opportunities for optimization, and provide a plan for how Lustre can boost your
performance, and build a future ready architecture for the convergence of HPC and Big Data
19
Dell - Internal Use - Confidential

More Related Content

What's hot

Backup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by NetezzaBackup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by NetezzaTony Pearson
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scaleinside-BigData.com
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red_Hat_Storage
 
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...Principled Technologies
 
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red_Hat_Storage
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsAlluxio, Inc.
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionNetApp
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersDataWorks Summit/Hadoop Summit
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Alluxio, Inc.
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataWANdisco Plc
 
Spectrum Scale final
Spectrum Scale finalSpectrum Scale final
Spectrum Scale finalJoe Krotz
 
Make sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instancesMake sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instancesPrincipled Technologies
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 

What's hot (20)

Backup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by NetezzaBackup Options for IBM PureData for Analytics powered by Netezza
Backup Options for IBM PureData for Analytics powered by Netezza
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scale
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
 
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
Maximizing Oracle Database performance with Intel SSD DC P3600 Series NVMe SS...
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An IntroductionTechnical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
 
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise CustomersHadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio Flexible and Fast Storage for Deep Learning with Alluxio
Flexible and Fast Storage for Deep Learning with Alluxio
 
Hadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big DataHadoop and WANdisco: The Future of Big Data
Hadoop and WANdisco: The Future of Big Data
 
Spectrum Scale final
Spectrum Scale finalSpectrum Scale final
Spectrum Scale final
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Make sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instancesMake sense of important data faster with AWS EC2 M6i instances
Make sense of important data faster with AWS EC2 M6i instances
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 

Viewers also liked

Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...Ferenc Szalai
 
Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)inside-BigData.com
 
HPC Storage Appliances for the Enterpris
HPC Storage Appliances for the EnterprisHPC Storage Appliances for the Enterpris
HPC Storage Appliances for the EnterprisIntel IT Center
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflowsinside-BigData.com
 
GPFS - graphical intro
GPFS - graphical introGPFS - graphical intro
GPFS - graphical introAlex Balk
 
What HPC can learn from DevOps?
What HPC can learn from DevOps?What HPC can learn from DevOps?
What HPC can learn from DevOps?Walid Shaari
 
Streamlining HPC Workloads with Containers
Streamlining HPC Workloads with ContainersStreamlining HPC Workloads with Containers
Streamlining HPC Workloads with ContainersDustin Kirkland
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsIgor José F. Freitas
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
[Container world 2017] The Questions You're Afraid to Ask about Containers
[Container world 2017] The Questions You're Afraid to Ask about Containers[Container world 2017] The Questions You're Afraid to Ask about Containers
[Container world 2017] The Questions You're Afraid to Ask about ContainersDustin Kirkland
 
Exploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPCExploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPCNVIDIA
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 
Top 5 Deep Learning Stories 2/24
Top 5 Deep Learning Stories 2/24Top 5 Deep Learning Stories 2/24
Top 5 Deep Learning Stories 2/24NVIDIA
 
HPC Top 5 Stories: March 22, 2017
HPC Top 5 Stories: March 22, 2017HPC Top 5 Stories: March 22, 2017
HPC Top 5 Stories: March 22, 2017NVIDIA
 

Viewers also liked (15)

Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
Nagy-teljesítményű, költséghatékony adattárolási technológiák könyvtári körny...
 
Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)
 
HPC Storage Appliances for the Enterpris
HPC Storage Appliances for the EnterprisHPC Storage Appliances for the Enterpris
HPC Storage Appliances for the Enterpris
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflows
 
GPFS - graphical intro
GPFS - graphical introGPFS - graphical intro
GPFS - graphical intro
 
What HPC can learn from DevOps?
What HPC can learn from DevOps?What HPC can learn from DevOps?
What HPC can learn from DevOps?
 
Streamlining HPC Workloads with Containers
Streamlining HPC Workloads with ContainersStreamlining HPC Workloads with Containers
Streamlining HPC Workloads with Containers
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
[Container world 2017] The Questions You're Afraid to Ask about Containers
[Container world 2017] The Questions You're Afraid to Ask about Containers[Container world 2017] The Questions You're Afraid to Ask about Containers
[Container world 2017] The Questions You're Afraid to Ask about Containers
 
Exploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPCExploring the Momentum: The Intersection of AI and HPC
Exploring the Momentum: The Intersection of AI and HPC
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Top 5 Deep Learning Stories 2/24
Top 5 Deep Learning Stories 2/24Top 5 Deep Learning Stories 2/24
Top 5 Deep Learning Stories 2/24
 
[DDBJing31] DDBJ と NIG SuperComputer の使い方
[DDBJing31] DDBJ と NIG SuperComputer の使い方[DDBJing31] DDBJ と NIG SuperComputer の使い方
[DDBJing31] DDBJ と NIG SuperComputer の使い方
 
HPC Top 5 Stories: March 22, 2017
HPC Top 5 Stories: March 22, 2017HPC Top 5 Stories: March 22, 2017
HPC Top 5 Stories: March 22, 2017
 

Similar to Dell Lustre Storage Architecture Presentation - MBUG 2016

Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialRoxycodone Online
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleAbhishek Sood
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperBlueData, Inc.
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesIntel® Software
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetAndrei Khurshudov
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkLenovo Data Center
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 

Similar to Dell Lustre Storage Architecture Presentation - MBUG 2016 (20)

Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum Scale
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Hadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White PaperHadoop Virtualization - Intel White Paper
Hadoop Virtualization - Intel White Paper
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Sgi hadoop
Sgi hadoopSgi hadoop
Sgi hadoop
 
EMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data StorageEMC Isilon Best Practices for Hadoop Data Storage
EMC Isilon Best Practices for Hadoop Data Storage
 
Accelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing TechnologiesAccelerate Big Data Processing with High-Performance Computing Technologies
Accelerate Big Data Processing with High-Performance Computing Technologies
 
clusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheetclusterstor-hadoop-data-sheet
clusterstor-hadoop-data-sheet
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Dell Lustre Storage Architecture Presentation - MBUG 2016

  • 1. High-Performance Computing Dell Intel EE Lustre Storage Melbourne Big Data User Group January 2016 Andrew Underwood Lead HPC Technologist
  • 2. The future of data is BIG, and HPC is needed to power it The digital universe is growing 40%* a year into the next decade… 2016 --------------------------------------------------- 2020 ~90% of the worlds data has been created in the last 2 years * Source: EMC Digital Universe with Research & Analysis by IDC 44ZB of data will exist in the digital universe by the year 2020 ~37% of the data generated in 2020 will be used for analysis and processing
  • 3. The convergence of HPC and Big Data are driving change Scalable performance and massive capacity Stable, predictable and reliable Balanced configuration, designed for parallel input-output Support compute intensive, and data intensive ‘Big Data’ workloads with Hadoop 1 2 3 4 Enterprise grade technology with 24/7 access to data5
  • 4. Introducing our latest generation Dell EE Lustre Storage Limitless Endless Scalability 11GB/s Peak Read per building block Up to 4PB raw capacity per rack Up to 44GB/s throughput per rack The Ultimate HPC File SystemParallel For ultimate scale-out Hadoop Converged Platform 7GB/s Peak Write per building block
  • 5. What is Lustre? Designed for maximum performance and scalability…. • Open-Source parallel file system built on open standards hardware • Global, shared name space – accessible by over 25,000+ clients • Object based file system with distributed file stripe across storage targets • Highly available design, with no single point of failure • Scalable beyond an Exabyte in capacity, and over a Petabyte of sustained throughput • Accessible by clients over network (Ethernet, InfiniBand, Omni-Path Architecture)
  • 6. The Intel Enterprise Edition of Lustre
  • 7. Inside the Lustre File System Ethernet Management Network High Performance Data Network (InfiniBand, Ethernet, Omni-Path) Metadata Servers Object Storage Servers Intel Manager for Lustre Lustre Clients (1 – 100,000+) Object Storage Targets (OSTs) Metadata Target (MDT) Management Target (MGT) Storage servers grouped into failover pairs
  • 8. A scalable building block design Designed to scale to the Exabyte, with a Petabyte of throughput Solution benefits & Dell differentiation • File system based on Intel Enterprise Edition for Lustre • Single file system namespace scalable to high capacities and performance • Engineered by Dell HPC Engineering to provide optimal performance on Dell hardware platform • Design providing maximum throughput per building block with on-the-fly storage expansion • Solution design for Big Data workloads using Intel Hadoop Adapter for Lustre (HAL) • Share data with other file systems utilizing optional NFS/CIFS gateway • Dell Networking 10/40GbE, InfiniBand, or Omni-Path
  • 9. How Lustre Works Basic File Storage Principles of the Lustre File System The file system consists of many object storage targets (OST) which are presented as a single unified file system. IO can be increased in most cases by using file striping. File striping divides data into chunks that are distributed across OSTs within the file system. Lustre Write File Data Flow 1) Client requests to write a file to the file system 2) Client contacts the MDS with a write request 3) MDS checks the user authentication and intended location of the file 4) MDS responds to the Client with a list of OSTs that the Client can write the file to 5) Client receives the response and writes to the assigned OSTs without further communication with the MDS Lustre Read File Data Flow 1) Client requests to read a file from the file system 2) Client contacts the MDS with a read request 3) MDS checks the user authentication and file location 4) MDS responds to the Client with a list of OSTs that the stripes of the file are located 5) Client receives the response and reads the data from the OSTs without further communication with the MDS
  • 10. Connectivity and HSM Lustre design is performance centric CIFS Access As over 99% of the worlds HPC and Supercomputing environments are built on Linux, Lustre is designed to be accessible by clients running RHEL/CentOS/SUSE Linux operating systems. Access can still be provided via CIFS using Dell PowerEdge R630 clients running SAMBA gateways. Hierarchical Storage Management (HSM) HSM can be configured and managed via the Intel Manager for Lustre (IML), and provides a reliable mechanism for archiving data onto tiers of secondary, high-capacity, affordable storage. The Intel Lustre HSM framework uses the Linux copytool and Robinhood
  • 11. Where performance meets scalability • Peak Write = 7GB/s • Peak Read = 11GB/s • Peak Write = 12.6K IOPS • Peak Read = 96K IOPS Sequential IO Performance – 1 MB Stripe Random IO Performance – 4 MB Stripe
  • 12. Increase metadata performance with Luster Distributed Namespace feature (Lustre DNE) • Lustre DNE Phase 1 is the designation for Luster DNE Remote Directories. • Lustre sub-directories can now be distributed across multiple MDTs to increase metadata capacity capabilities and performance.
  • 13. Configure, optimize and manage using Intel Manager for Lustre IML simplifies your management workflow: • UI driven configuration, monitoring, and overall management lowers complexity and cost • Advanced charting options illustrate storage performance in near-real time • Automated configuration of storage servers pairs for increased high-availability • Configure and manage power distribution units for automated fail-over • Smart, intuitive alerts and logs help storage administrators monitor storage performance
  • 14. Intel Manager for Lustre (IML) software dashboard The ‘dashboard’ canvas displays a variety of dynamic charts illustrating performance levels and resource utilization. Administrators can easily view file systems, check resource consumption for jobs, and monitor performance. In depth storage hardware reporting is possible when combined with optional hardware vendor provided plug-ins. System status indictor provides the status for all managed file systems. Click to go to detailed information. Easily configure servers, volumes and power controls. Optionally, enable HSM per file system Intelligent, intuitive log files – quickly understand how your storage is performing
  • 15. Where Big Data meets high-performance computing As data sets expand, the infrastructure that supports them needs to be faster, bigger, more scalable Hadoop over Lustre (HAL) Intel EE Lustre has plugins for Apache Hadoop and Cloudera Distribution of Hadoop, which require no changes to the Lustre architecture and allows for big data workloads to take advantage of high-performance infrastructure, by replacing the HDFS portion of Hadoop with the Lustre file system. This is driving a new market for High-Performance Data Analytics, where HPC boosts performance of Big Data workloads. Measuring the performance of Hadoop running on Lustre – Check out this video of Dell deploying Hadoop on Lustre in production * Source: Tests are from Intel and Tata Consultancy Services Lustre Big Data White Paper (Click Here)
  • 16. Start by making your data future ready! Contact your local Dell Account Executive for a free HPC workshop, and we can help identify bottlenecks, opportunities for optimization, and provide a plan for how Lustre can boost your performance, and build a future ready architecture for the convergence of HPC and Big Data
  • 17. 19 Dell - Internal Use - Confidential