SlideShare une entreprise Scribd logo
1  sur  70
Network Engineering for High Speed
Data Sharing
Eli Dart, Science Engagement
Energy Sciences Network (ESnet)
Lawrence Berkeley National Laboratory
AGU 2018
Washington, DC
December 12, 2018
Outline
• Motivation, Context
• Modern Research Data Portal
• Petascale DTN Project
• Long Term Vision
12/20/20182
• Networks are an essential part of data-intensive science
– Connect data sources to data analysis
– Connect collaborators to each other
– Enable machine-consumable interfaces to data and analysis resources
(e.g. portals), automation, scale
• Performance is critical
– Exponential data growth
– Constant human factors
– Data movement and data analysis must keep up
• Effective use of wide area (long-haul) networks by scientists
has historically been difficult
Motivation
3 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
Data Placement: A Common Problem
• Scientists often need to move data from where it is to where it needs to be
– Observation to analysis
– Assemble data set from multiple sources
– Transfer data to/from supercomputer center
• Data movement tools run on systems which use networks – lots involved:
– Servers, storage
– Networks, security policy
• Lots of ways to assemble these things  architecture
• Traditional architectures are not performant in today’s context
– Large data objects
– Data sets with tens of thousands (or more) files
12/20/20184
Science Data Portals
• Large repositories of scientific data
– Climate data
– Sky surveys (astronomy, cosmology)
– Many others
– Data search, browsing, access
• Many scientific data portals were designed 15+ years ago
– Single-web-server design
– Data browse/search, data access, user awareness all in a single system
– All the data goes through the portal server
• In many cases by design
• E.g. embargo before publication (enforce access control)
– Better than old command-line FTP, but outdated by today’s standards
12/20/20185
Legacy Portal Design
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Data path
Portal server applications:
web server
search
database
authentication
data service
12/20/20186
• Very difficult to improve performance
without architectural change
– Software components all tangled
together
– Complexity makes security hard
– Many components aren’t scalable
• What does architectural change mean?
Architectural Examination of Data Portals
• Common data portal functions (most portals have these)
– Search/query/discovery
– Data download method for data access
– GUI for browsing by humans
– API for machine access – ideally incorporates search/query + download
• Performance pain is primarily in the data handling piece
– Rapid increase in data scale eclipsed legacy software stack capabilities
– Portal servers often stuck in enterprise network
• Can we “disassemble” the portal and put the pieces back together better?
12/20/20187
Legacy Portal Design
10GE
Border Router
WAN
Firewall
Enterprise
perfSONAR
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Data path
Portal server applications:
web server
search
database
authentication
data service
12/20/20188
Next-Generation Portal Leverages Science DMZ
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed
by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal
Server
Browsing path
Query path
Portal server applications:
web server
search
database
authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
12/20/20189
https://peerj.com/articles/cs-144/
NCAR RDA Data Portal
• Let’s say I have a nice compute allocation at NERSC – a national
supercomputer center
• Let’s say I need some data from NCAR for my project
• https://rda.ucar.edu/
• Data sets (there are many more, but these are two examples):
• https://rda.ucar.edu/datasets/ds199.1/
• https://rda.ucar.edu/datasets/ds313.0/
• Download to NERSC (could also do ALCF or NCSA or OLCF)
12/20/201810
12/20/201811
12/20/201812
12/20/201813
12/20/201814
Portal creates a Globus transfer job for us
12/20/201815
Submit the transfer job, go about our business
12/20/201816
Data Transfer from RDA Portal – Results
12/20/201817
Science DMZ Design Pattern (Abstract)
10GE
10GE
10GE
10GE
10G
BorderRouter
WAN
ScienceDMZ
Switch/Router
EnterpriseBorder
Router/Firewall
Site/Campus
LAN
Highperformance
DataTransferNode
withhigh-speedstorage
Per-service
securitypolicy
controlpoints
Clean,
High-bandwidth
WANpath
Site/Campus
accesstoScience
DMZresources
perfSONAR
perfSONAR
perfSONAR
© 2014, Energy Sciences Network
18 – ESnet Science Engagement (engage@es.net) - 12/20/2018
http://fasterdata.es.net/science-dmz/
Put The Data On Dedicated Infrastructure
• We have separated the data handling from the portal logic
• Portal is still its normal self, but enhanced
– Portal GUI, database, search, etc. all function as they did before
– Query returns pointers to data objects in the Science DMZ
– Portal is now freed from the data servers (run it in the Cloud if you want!)
• Data handling is separate, and scalable
– High-performance data cluster in the Science DMZ
– Scale as much as you need to without modifying the portal software
• Shift data management to computing centers
– Computing centers are set up for large-scale data
– Let them handle the large-scale data, and let the portal do the orchestration
of data placement
• https://peerj.com/articles/cs-144/ - Modern Research Data Portal paper
12/20/201819
Data And HPC: The Petascale DTN Project
• Built on top of the Science DMZ
• Effort to improve data transfer performance between the DOE ASCR HPC
facilities at ANL, LBNL, and ORNL, and also NCSA.
– Multiple current and future science projects need to transfer data between HPC
facilities
– Performance was slow, configurations inconsistent
– Performance goal of 15 gigabits per second (equivalent to 1PB/week)
– Realize performance goal for routine Globus transfers without special tuning
• Reference data set is 4.4TB of cosmology simulation data
• Use performant, easy-to-use tools with production options on
– Globus Transfer service (previously Globus Online)
– Use GUI just like a user would, with default options
• E.g. integrity checksums enabled, as they should be
• No arcane magic!
12/20/201820
DTN Cluster Performance – HPC Facilities (2017)
21.2/22.6/24.5
Gbps
23.1/33.7/39.7
Gbps
26.7/34.7/39.9
Gbps
33.2/43.4/50.3
Gbps
35.9/39.0/40.7
Gbps
29.9/33.1/35.5
Gbps
34.6/47.5/56.8
Gbps
44.1/46.8/48.4
Gbps
41.0/42.2/43.9
Gbps
33.0/35.0/37.8
Gbps
43.0/50.0/56.3
Gbps
55.4/56.7/57.4
Gbps
DTN
DTN
DTN
DTN
NERSC DTN cluster
Globus endpoint: nersc#dtn
Filesystem: /project
Data set: L380
Files: 19260
Directories: 211
Other files: 0
Total bytes: 4442781786482 (4.4T bytes)
Smallest file: 0 bytes (0 bytes)
Largest file: 11313896248 bytes (11G bytes)
Size distribution:
1 - 10 bytes: 7 files
10 - 100 bytes: 1 files
100 - 1K bytes: 59 files
1K - 10K bytes: 3170 files
10K - 100K bytes: 1560 files
100K - 1M bytes: 2817 files
1M - 10M bytes: 3901 files
10M - 100M bytes: 3800 files
100M - 1G bytes: 2295 files
1G - 10G bytes: 1647 files
10G - 100G bytes: 3 files
Petascale DTN Project
November 2017
L380 Data Set
Gigabits per second
(min/avg/max), three
transfers
ALCF DTN cluster
Globus endpoint: alcf#dtn_mira
Filesystem: /projects
OLCF DTN cluster
Globus endpoint: olcf#dtn_atlas
Filesystem: atlas2
NCSA DTN cluster
Globus endpoint: ncsa#BlueWaters
Filesystem: /scratch
12/20/201821
NCAR RDA Performance to DOE HPC Facilities
13.9 Gbps 16.6 Gbps 11.9 Gbps
DTN
nersc#dtn
NERSC
DTN
olcf#dtn_atlas
OLCF
DTN
alcf#dtn_mira
ALCF
DTN
NCAR RDA
rda#datashare
12/20/201822
• 1.5TB data set
• 1121 files
MRDP Partially Integrated Into ESGF
12/20/201823
Modernized Cyberinfrastructure
• This is an example of the capabilities of modern cyberinfrastructure
– High speed networks
– Science DMZ design pattern
– Modern Research Data Portal design pattern
– HPC facilities
– High performance data platforms
• Together these enable dramatically-improved data placement performance
• Large-scale data analysis is now possible
– Data from portals analyzed at supercomputer centers
– Data shared between supercomputer centers
12/20/201824
Larger Strategic Picture
• Across the scientific community, larger structures are being built
– HPC facilities combined with experiments
– DTNs between campuses
– These create the platform for future scientific discoveries.
• Building DMZs, DTNs, and similar things for scientists puts the power of
modern cyberinfrastructure in the hands of the people who will make the
discoveries that change our world for the better.
• By doing this work, we help bring about the future that we all want - better
medicine, better technology, more energy, a cleaner environment, etc.
12/20/201825
Long-Term Vision
ESnet
(Big Science facilities,
DOE labs)
Internet2 + Regionals
(US Universities and
affiliated institutions)
International Networks
(Universities and labs in Europe,
Asia, Americas, Australia, etc.)
High-performance feature-rich
science network ecosystem
Commercial Clouds
(Amazon, Google,
Microsoft, etc.)
Agency Networks
(NASA, NOAA, etc.)
Campus HPC
+
Data
12/20/201826
It’s All A Bunch Of Science DMZs
High-performance feature-rich
science network ecosystem
DTN
DTN
DTN
DTN
DMZDMZ
DMZDMZ
DTN
DATA
DTN
DMZDMZ
DTN
DTN
DMZDMZ
DATA
DTN DTN
DMZDMZ
Parallel
Filesystem
DTN DTN
DTN
DTNDMZDMZ
DATA
DTN
DTN
DMZDMZ
Experiment
Data Archive
DTN
DTN
DTN
DTN
DTN DMZDMZ
DTN DTN
DMZDMZ
DATA
12/20/201827
It’s All A Bunch Of Science DMZs
High-performance feature-rich
science network ecosystem
DTN
DTN
DTN
DTN
DMZDMZ
DMZDMZ
DTN
DATA
DTN
DMZDMZ
DTN
DTN
DMZDMZ
DATA
DTN DTN
DMZDMZ
Parallel
Filesystem
DTN DTN
DTN
DTNDMZDMZ
DATA
DTN
DTN
DMZDMZ
Experiment
Data Archive
DTN
DTN
DTN
DTN
DTN DMZDMZ
DTN DTN
DMZDMZ
DATA
HPC Facilities
Single Lab
Experiments
Data
Portal
LHC
Experiments
University
Computing
12/20/201828
Thanks!
Eli Dart - dart@es.net
Energy Sciences Network (ESnet)
Lawrence Berkeley National Laboratory
engage@es.net
http://my.es.net/
http://www.es.net/
http://fasterdata.es.net/
Extra slides – data download from portal
12/20/201830
ESnet - the basic facts:
High-speed international networking facility,
optimized for data-intensive science:
• connecting 50 labs, plants and facilities with >150 networks,
universities, research partners globally
• supporting every science office, and serving as an integral
extension of many instruments
• 400Gbps transatlantic extension in production since Dec 2014
• >1.3 Tbps of external connectivity, including high speed access
to commercial partners such as Amazon
• growing number university connections to better serve LHC
science (and eventually: Belle II)
• older than commercial Internet, growing ~twice as fast
Areas of strategic focus: software, science engagement.
• Engagement effort now 12% of staff
• Software capability critical to next-generation network
31
Extra Slides – Science DMZ
12/20/201832
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Network Monitoring For Performance
• Data Transfer Nodes & Applications
• Science Engagement
33 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
• Networks are an essential part of data-intensive science
– Connect data sources to data analysis
– Connect collaborators to each other
– Enable machine-consumable interfaces to data and analysis resources
(e.g. portals), automation, scale
• Performance is critical
– Exponential data growth
– Constant human factors
– Data movement and data analysis must keep up
• Effective use of wide area (long-haul) networks by scientists
has historically been difficult
Motivation
34 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
The Central Role of the Network
• The very structure of modern science assumes science networks exist: high
performance, feature rich, global scope
• What is “The Network” anyway?
– “The Network” is the set of devices and applications involved in the use of a
remote resource
• This is not about supercomputer interconnects
• This is about data flow from experiment to analysis, between facilities, etc.
– User interfaces for “The Network” – portal, data transfer tool, workflow engine
– Therefore, servers and applications must also be considered
• What is important? Ordered list:
1. Correctness
2. Consistency
3. Performance
35 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
TCP – Ubiquitous and Fragile
• Networks provide connectivity between hosts – how do hosts see the
network?
– From an application’s perspective, the interface to “the other end” is a
socket
– Communication is between applications – mostly over TCP
• TCP – the fragile workhorse
– TCP is (for very good reasons) timid – packet loss is interpreted as
congestion
– Packet loss in conjunction with latency is a performance killer
– Like it or not, TCP is used for the vast majority of data transfer
applications (more than 95% of ESnet traffic is TCP)
36 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
A small amount of packet loss makes a huge
difference in TCP performance
Metro Area
Local
(LAN)
Regional
Continental
International
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance
beyond metro distances is
essentially impossible
37 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
Working With TCP In Practice
• Far easier to support TCP than to fix TCP
– People have been trying to fix TCP for years – limited success
– Like it or not we’re stuck with TCP in the general case
• Pragmatically speaking, we must accommodate TCP
– Sufficient bandwidth to avoid congestion
– Zero packet loss
– Verifiable infrastructure
• Networks are complex
• Must be able to locate problems quickly
• Small footprint is a huge win – small number of devices so that problem
isolation is tractable
© 2016, Energy Sciences Network
38 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Putting A Solution Together
• Effective support for TCP-based data transfer
– Design for correct, consistent, high-performance operation
– Design for ease of troubleshooting
• Easy adoption is critical
– Large laboratories and universities have extensive IT deployments
– Drastic change is prohibitively difficult
• Cybersecurity – defensible without compromising performance
• Borrow ideas from traditional network security
– Traditional DMZ
• Separate enclave at network perimeter (“Demilitarized Zone”)
• Specific location for external-facing services
• Clean separation from internal network
– Do the same thing for science – Science DMZ
© 2016, Energy Sciences Network
39 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Dedicated
Systems for Data
Transfer
Network
Architecture
Performance
Testing &
Measurement
Data Transfer Node
• High performance
• Configured specifically
for data transfer
• Proper tools
Science DMZ
• Dedicated network
location for high-speed
data resources
• Appropriate security
• Easy to deploy - no need
to redesign the whole
network
perfSONAR
• Enables fault isolation
• Verify correct operation
• Widely deployed in ESnet
and other networks, as
well as sites and facilities
The Science DMZ Design Pattern
© 2016, Energy Sciences Network
40 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Abstract or Prototype Deployment
• Add-on to existing network infrastructure
– All that is required is a port on the border router
– Small footprint, pre-production commitment
• Easy to experiment with components and technologies
– DTN prototyping
– perfSONAR testing
• Limited scope makes security policy exceptions easy
– Only allow traffic from partners
– Add-on to production infrastructure – lower risk
41 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
Science DMZ Design Pattern (Abstract)
10GE
10GE
10GE
10GE
10G
BorderRouter
WAN
ScienceDMZ
Switch/Router
EnterpriseBorder
Router/Firewall
Site/Campus
LAN
Highperformance
DataTransferNode
withhigh-speedstorage
Per-service
securitypolicy
controlpoints
Clean,
High-bandwidth
WANpath
Site/Campus
accesstoScience
DMZresources
perfSONAR
perfSONAR
perfSONAR
© 2016, Energy Sciences Network
42 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Local And Wide Area Data Flows
10GE
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZ
Switch/Router
Enterprise Border
Router/Firewall
Site / Campus
LAN
High performance
Data Transfer Node
with high-speed storage
Per-service
security policy
control points
Clean,
High-bandwidth
WAN path
Site / Campus
access to Science
DMZ resources
perfSONAR
perfSONAR
High Latency WAN Path
Low Latency LAN Path
perfSONAR
© 2016, Energy Sciences Network
43 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Support For Multiple Projects
• Science DMZ architecture allows multiple projects to put DTNs in place
– Modular architecture
– Centralized location for data servers
• This may or may not work well depending on institutional politics
– Issues such as physical security can make this a non-starter
– On the other hand, some shops already have service models in place
• On balance, this can provide a cost savings – it depends
– Central support for data servers vs. carrying data flows
– How far do the data flows have to go?
© 2016, Energy Sciences Network
44 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Multiple Projects
10GE
10GE
10GE
10G
Border Router
WAN
Science DMZ
Switch/Router
Enterprise Border
Router/Firewall
Site / Campus
LAN
Project A DTN
Per-project
security policy
control points
Clean,
High-bandwidth
WAN path
Site / Campus
access to Science
DMZ resources
perfSONAR
perfSONAR
Project B DTN
Project C DTN
© 2016, Energy Sciences Network
45 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Multiple Science DMZs – Dark Fiber
Dark
Fiber
Dark
Fiber
10GE
Dark
Fiber
10GE
10G
Border Router
WAN
Science DMZ
Switch/Routers
Enterprise Border
Router/Firewall
Site / Campus
LAN
Project A DTN
(building A)
Per-project
security
policy
perfSONAR
perfSONAR
Facility B DTN
(building B)
Cluster DTN
(building C)
perfSONARperfSONAR
Cluster
(building C)
© 2016, Energy Sciences Network
46 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Supercomputer Center Deployment
• High-performance networking is assumed in this environment
– Data flows between systems, between systems and storage, wide area, etc.
– Global filesystem often ties resources together
• Portions of this may not run over Ethernet (e.g. IB)
• Implications for Data Transfer Nodes
• “Science DMZ” may not look like a discrete entity here
– By the time you get through interconnecting all the resources, you end up
with most of the network in the Science DMZ
– This is as it should be – the point is appropriate deployment of tools,
configuration, policy control, etc.
• Office networks can look like an afterthought, but they aren’t
– Deployed with appropriate security controls
– Office infrastructure need not be sized for science traffic
© 2016, Energy Sciences Network
47 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Supercomputer Center
Virtual
Circuit
Routed
Border Router
WAN
Core
Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front end
switch
Data Transfer
Nodes
Front end
switch
© 2016, Energy Sciences Network
48 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Supercomputer Center Data Path
Virtual
Circuit
Routed
Border Router
WAN
Core
Switch/Router
Firewall
Offices
perfSONAR
perfSONAR
perfSONAR
Supercomputer
Parallel Filesystem
Front end
switch
Data Transfer
Nodes
Front end
switch
High Latency WAN Path
Low Latency LAN Path
High Latency VC Path
© 2016, Energy Sciences Network
49 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Common Threads
• Two common threads exist in all these examples
• Accommodation of TCP
– Wide area portion of data transfers traverses purpose-built path
– High performance devices that don’t drop packets
• Ability to test and verify
– When problems arise (and they always will), they can be solved if the
infrastructure is built correctly
– Small device count makes it easier to find issues
– Multiple test and measurement hosts provide multiple views of the data
path
• perfSONAR nodes at the site and in the WAN
• perfSONAR nodes at the remote site
© 2016, Energy Sciences Network
50 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Network Monitoring For Performance
• Data Transfer Nodes & Applications
• Science Engagement
51 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
Performance Monitoring
• Everything may function perfectly when it is deployed
• Eventually something is going to break
– Networks and systems are complex
– Bugs, mistakes, …
– Sometimes things just break – this is why we buy support contracts
• Must be able to find and fix problems when they occur
• Must be able to find problems in other networks (your network may
be fine, but someone else’s problem can impact your users)
• TCP was intentionally designed to hide all transmission errors from
the user:
– “As long as the TCPs continue to function properly and the internet
system does not become completely partitioned, no transmission errors
will affect the users.” (From RFC793, 1981)
© 2016, Energy Sciences Network
52 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Soft Network Failures – Hidden Problems
• Hard failures are well-understood
– Link down, system crash, software crash
– Routing protocols are designed to cope with hard failures (route around
damage)
– Traditional network/system monitoring tools designed to quickly find hard
failures
• Soft failures result in degraded capability
– Connectivity exists
– Performance impacted
– Typically something in the path is functioning, but not well
• Soft failures are hard to detect with traditional methods
– No obvious single event
– Sometimes no indication at all of any errors
• Independent testing is the only way to reliably find soft failures
© 2016, Energy Sciences Network
53 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Rebooted router to
fix forwarding table
Gradual failure
of optical line
card
Sample Soft Failures
Gb/s
normal
performance
degrading
performance
repair
one month
© 2016, Energy Sciences Network
54 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Testing Infrastructure – perfSONAR
• perfSONAR is:
– A widely-deployed test and measurement infrastructure
• ESnet, Internet2, US regional networks, international networks
• Laboratories, supercomputer centers, universities
• Individual Linux hosts at key network locations (POPs, Science DMZs, etc.)
– A suite of test and measurement tools
– A collaboration that builds and maintains the toolkit
• By installing perfSONAR, a site can leverage over 1400 test servers
deployed around the world
• perfSONAR is ideal for finding soft failures
– Alert to existence of problems
– Fault isolation
– Verification of correct operation
© 2016, Energy Sciences Network
55 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Lookup Service Directory Search:
http://stats.es.net/ServicesDirectory/
© 2016, Energy Sciences Network56 – ESnet Science Engagement (engage@es.net) - 12/20/2018
perfSONAR Dashboard: http://ps-
dashboard.es.net
© 2016, Energy Sciences Network57 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Network Monitoring For Performance
• Data Transfer Nodes & Applications
• Science Engagement
58 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
Dedicated Systems – The Data Transfer Node
• The DTN is dedicated to data transfer
• Set up specifically for high-performance data movement
– System internals (BIOS, firmware, interrupts, etc.)
– Network stack
– Storage (global filesystem, Fibrechannel, local RAID, etc.)
– High performance tools
– No extraneous software
• Limitation of scope and function is powerful
– No conflicts with configuration for other tasks
– Small application set makes cybersecurity easier
• Limitation of application set is often a core security policy component
© 2016, Energy Sciences Network
59 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Data Transfer Tools For DTNs
• Parallelism is important
– It is often easier to achieve a given performance level with four parallel
connections than one connection
– Several tools offer parallel transfers, including Globus/GridFTP
• Latency interaction is critical
– Wide area data transfers have much higher latency than LAN transfers
– Many tools and protocols assume a LAN
• Workflow integration is important
• Key tools: Globus Online, HPN-SSH
© 2016, Energy Sciences Network
60 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Say NO to SCP (2016)
• Using the right data transfer tool is very important
• Sample Results: Berkeley, CA to Argonne, IL (near Chicago ) RTT = 53 ms,
network capacity = 10Gbps.
• Notes
– scp is 24x slower than GridFTP on this path!!
– to get more than 1 Gbps (125 MB/s) disk to disk requires RAID array.
– (Assumes host TCP buffers are set correctly for the RTT)
Tool Throughput
scp 330 Mbps
wget, GridFTP, FDT, 1 stream 6 Gbps
GridFTP and FDT, 4 streams 8 Gbps (disk limited)
61 – ESnet Science Engagement (engage@es.net) -
12/20/2018
Overview
• Science DMZ Motivation and Introduction
• Science DMZ Architecture
• Network Monitoring For Performance
• Data Transfer Nodes & Applications
• Science Engagement
62 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
The Data Transfer Superfecta: Science DMZ Model
63 – ESnet Science Engagement (engage@es.net) -
Data Transfer Node
• Configured for data
transfer
• High performance
• Proper tools
perfSONAR
• Enables fault isolation
• Verify correct operation
• Widely deployed in ESnet
and other networks, as
well as sites and facilities
Science DMZ
• Dedicated location for DTN
• Proper security
• Easy to deploy - no need to redesign the
whole network
Engagement
• Resources & Knowledgebase
• Partnerships
• Education & Consulting
© 2016, Energy Sciences Network
Context Setting
• DOE, NSF, and other agencies are
investing billions of dollars in state-of-
the-art cyberinfrastructure to support
data-intensive science.
• Many researchers do not understand
the value of these services and have
difficulty using them.
• A proactive effort is needed to drive
adoption of advanced services and
accelerate science output: Science
Engagement
64 – ESnet Science Engagement (engage@es.net) -
12/20/2018 © 2016, Energy Sciences Network
Science Engagement
• The ESnet Science Engagement team's mission is to ensure that science
collaborations at every scale, in every domain, have the information and
tools they need to achieve maximum benefit from global networks through
the creation of scalable, community-driven strategies and approaches.
• Science Engagement team works in several areas at once
– Understand key elements which contribute to desired outcomes
• Requirements analysis – what is needed
• Also identify choke points, road blocks, missing components
– Network architecture, performance, best practice
– Systems engineering, consulting, troubleshooting
– Collaboration with others
– Workshops and webinars
65 – ESnet Science Engagement (engage@es.net) -
12/20/2018 © 2016, Energy Sciences Network
• The Science DMZ design pattern provides a flexible model for supporting
high-performance data transfers and workflows
• Key elements:
– Accommodation of TCP
• Sufficient bandwidth to avoid congestion
• Loss-free IP service
– Location – near the site perimeter if possible
– Test and measurement
– Dedicated systems
– Appropriate security
– Science Engagement to foster adoption
Science DMZ Wrapup
© 2016, Energy Sciences Network
66 – ESnet Science Engagement (engage@es.net) - 12/20/2018
Links and Lists
– ESnet fasterdata knowledge base
• http://fasterdata.es.net/
– Science DMZ paper
• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf
– Science DMZ email list
• Send mail to sympa@lists.lbl.gov with subject "subscribe esnet-sciencedmz”
– perfSONAR
• http://fasterdata.es.net/performance-testing/perfsonar/
• http://www.perfsonar.net
– Globus
• https://www.globus.org/
67 – ESnet Science Engagement (engage@es.net) - 12/20/2018
© 2016, Energy Sciences Network
Context: Science DMZ Adoption
• DOE National Laboratories
– Supercomputer centers, LHC sites, experimental facilities
– Both large and small sites
• NSF CC* programs have funded many Science DMZs
– Large investments across the US university complex: over $100M
– Significant strategic importance
• Outside the USA
– Australia
– Brazil
– UK
– More…
12/20/201868
Strategic Impacts
• What does this mean?
– We are in the midst of a significant cyberinfrastructure upgrade
– Enterprise networks need not be unduly perturbed 
• Significantly enhanced capabilities compared to 5 years ago
– Terabyte-scale data movement is much easier
– Petabyte-scale data movement possible outside the LHC experiments
• ~3.1Gbps = 1PB/month
• ~14Gbps = 1PB/week
– Widely-deployed tools are much better (e.g. Globus)
• Metcalfe’s Law of Network Utility
– Value of Science DMZ proportional to the number of DMZs
• n2 or n(logn) doesn’t matter – the effect is real
– Cyberinfrastructure value increases as we all upgrade
12/20/201869
Next Steps – Building On The Science DMZ
• Enhanced cyberinfrastructure substrate now exists
– Wide area networks (ESnet, GEANT, NRENs, Internet2, Regionals)
– Science DMZs connected to those networks
– DTNs in the Science DMZs
• What does the scientist see?
– Scientist sees a science application
• Data transfer
• Data portal
• Data analysis
– Science applications are the user interface to networks and DMZs
• Large-scale data-intensive science requires that we build science
applications on top of the substrate components
12/20/201870

Contenu connexe

Tendances

The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInOSCON Byrum
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonDataWorks Summit/Hadoop Summit
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityDataWorks Summit
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsJohn Evans
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseRidwan Fadjar
 
Ingestion and Historization in the Data Lake
Ingestion and Historization in the Data LakeIngestion and Historization in the Data Lake
Ingestion and Historization in the Data Lakeiamtodor
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Cloudera, Inc.
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرdatastack
 

Tendances (20)

The Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedInThe Big Data Ecosystem at LinkedIn
The Big Data Ecosystem at LinkedIn
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizonHadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
Hadoop and Friends as Key Enabler of the IoE - Continental's Dynamic eHorizon
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium EnterpriseA Study Review of Common Big Data Architecture for Small-Medium Enterprise
A Study Review of Common Big Data Architecture for Small-Medium Enterprise
 
Ingestion and Historization in the Data Lake
Ingestion and Historization in the Data LakeIngestion and Historization in the Data Lake
Ingestion and Historization in the Data Lake
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
 

Similaire à Network Engineering for High Speed Data Sharing

Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility ExhibitionGlobus
 
Tutorial: Maximizing Performance and Network Utility with a Science DMZ
Tutorial: Maximizing Performance and Network Utility with a Science DMZTutorial: Maximizing Performance and Network Utility with a Science DMZ
Tutorial: Maximizing Performance and Network Utility with a Science DMZGlobus
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Jisc
 
The Science DMZ
The Science DMZThe Science DMZ
The Science DMZJisc
 
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...SURFnet
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
HPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mileHPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mileFrédérick Lefebvre
 
Future services on Janet
Future services on JanetFuture services on Janet
Future services on JanetJisc
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Parallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applicationsParallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applicationsJisc
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
 
Spotlight on the petroleum and energy vertical
Spotlight on the petroleum and energy vertical Spotlight on the petroleum and energy vertical
Spotlight on the petroleum and energy vertical FileCatalyst
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosStenio Fernandes
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonJeffrey T. Pollock
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Tal Lavian Ph.D.
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Tal Lavian Ph.D.
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 

Similaire à Network Engineering for High Speed Data Sharing (20)

Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility Exhibition
 
Tutorial: Maximizing Performance and Network Utility with a Science DMZ
Tutorial: Maximizing Performance and Network Utility with a Science DMZTutorial: Maximizing Performance and Network Utility with a Science DMZ
Tutorial: Maximizing Performance and Network Utility with a Science DMZ
 
Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...Enabling efficient movement of data into & out of a high-performance analysis...
Enabling efficient movement of data into & out of a high-performance analysis...
 
The Science DMZ
The Science DMZThe Science DMZ
The Science DMZ
 
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
HPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mileHPCS16 - Frederick Lefebvre - Bridging the last mile
HPCS16 - Frederick Lefebvre - Bridging the last mile
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Future services on Janet
Future services on JanetFuture services on Janet
Future services on Janet
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Parallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applicationsParallel session: supporting data-intensive applications
Parallel session: supporting data-intensive applications
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Spotlight on the petroleum and energy vertical
Spotlight on the petroleum and energy vertical Spotlight on the petroleum and energy vertical
Spotlight on the petroleum and energy vertical
 
Big Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking ScenariosBig Data Analytics and Advanced Computer Networking Scenarios
Big Data Analytics and Advanced Computer Networking Scenarios
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
 
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
Lambda Data Grid: An Agile Optical Platform for Grid Computing and Data-inten...
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 

Plus de Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 

Plus de Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 

Dernier

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Dernier (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Network Engineering for High Speed Data Sharing

  • 1. Network Engineering for High Speed Data Sharing Eli Dart, Science Engagement Energy Sciences Network (ESnet) Lawrence Berkeley National Laboratory AGU 2018 Washington, DC December 12, 2018
  • 2. Outline • Motivation, Context • Modern Research Data Portal • Petascale DTN Project • Long Term Vision 12/20/20182
  • 3. • Networks are an essential part of data-intensive science – Connect data sources to data analysis – Connect collaborators to each other – Enable machine-consumable interfaces to data and analysis resources (e.g. portals), automation, scale • Performance is critical – Exponential data growth – Constant human factors – Data movement and data analysis must keep up • Effective use of wide area (long-haul) networks by scientists has historically been difficult Motivation 3 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 4. Data Placement: A Common Problem • Scientists often need to move data from where it is to where it needs to be – Observation to analysis – Assemble data set from multiple sources – Transfer data to/from supercomputer center • Data movement tools run on systems which use networks – lots involved: – Servers, storage – Networks, security policy • Lots of ways to assemble these things  architecture • Traditional architectures are not performant in today’s context – Large data objects – Data sets with tens of thousands (or more) files 12/20/20184
  • 5. Science Data Portals • Large repositories of scientific data – Climate data – Sky surveys (astronomy, cosmology) – Many others – Data search, browsing, access • Many scientific data portals were designed 15+ years ago – Single-web-server design – Data browse/search, data access, user awareness all in a single system – All the data goes through the portal server • In many cases by design • E.g. embargo before publication (enforce access control) – Better than old command-line FTP, but outdated by today’s standards 12/20/20185
  • 6. Legacy Portal Design 10GE Border Router WAN Firewall Enterprise perfSONAR perfSONAR Filesystem (data store) 10GE Portal Server Browsing path Query path Data path Portal server applications: web server search database authentication data service 12/20/20186 • Very difficult to improve performance without architectural change – Software components all tangled together – Complexity makes security hard – Many components aren’t scalable • What does architectural change mean?
  • 7. Architectural Examination of Data Portals • Common data portal functions (most portals have these) – Search/query/discovery – Data download method for data access – GUI for browsing by humans – API for machine access – ideally incorporates search/query + download • Performance pain is primarily in the data handling piece – Rapid increase in data scale eclipsed legacy software stack capabilities – Portal servers often stuck in enterprise network • Can we “disassemble” the portal and put the pieces back together better? 12/20/20187
  • 8. Legacy Portal Design 10GE Border Router WAN Firewall Enterprise perfSONAR perfSONAR Filesystem (data store) 10GE Portal Server Browsing path Query path Data path Portal server applications: web server search database authentication data service 12/20/20188
  • 9. Next-Generation Portal Leverages Science DMZ 10GE10GE 10GE 10GE Border Router WAN Science DMZ Switch/Router Firewall Enterprise perfSONAR perfSONAR 10GE 10GE 10GE 10GE DTN DTN API DTNs (data access governed by portal) DTN DTN perfSONAR Filesystem (data store) 10GE Portal Server Browsing path Query path Portal server applications: web server search database authentication Data Path Data Transfer Path Portal Query/Browse Path 12/20/20189 https://peerj.com/articles/cs-144/
  • 10. NCAR RDA Data Portal • Let’s say I have a nice compute allocation at NERSC – a national supercomputer center • Let’s say I need some data from NCAR for my project • https://rda.ucar.edu/ • Data sets (there are many more, but these are two examples): • https://rda.ucar.edu/datasets/ds199.1/ • https://rda.ucar.edu/datasets/ds313.0/ • Download to NERSC (could also do ALCF or NCSA or OLCF) 12/20/201810
  • 15. Portal creates a Globus transfer job for us 12/20/201815
  • 16. Submit the transfer job, go about our business 12/20/201816
  • 17. Data Transfer from RDA Portal – Results 12/20/201817
  • 18. Science DMZ Design Pattern (Abstract) 10GE 10GE 10GE 10GE 10G BorderRouter WAN ScienceDMZ Switch/Router EnterpriseBorder Router/Firewall Site/Campus LAN Highperformance DataTransferNode withhigh-speedstorage Per-service securitypolicy controlpoints Clean, High-bandwidth WANpath Site/Campus accesstoScience DMZresources perfSONAR perfSONAR perfSONAR © 2014, Energy Sciences Network 18 – ESnet Science Engagement (engage@es.net) - 12/20/2018 http://fasterdata.es.net/science-dmz/
  • 19. Put The Data On Dedicated Infrastructure • We have separated the data handling from the portal logic • Portal is still its normal self, but enhanced – Portal GUI, database, search, etc. all function as they did before – Query returns pointers to data objects in the Science DMZ – Portal is now freed from the data servers (run it in the Cloud if you want!) • Data handling is separate, and scalable – High-performance data cluster in the Science DMZ – Scale as much as you need to without modifying the portal software • Shift data management to computing centers – Computing centers are set up for large-scale data – Let them handle the large-scale data, and let the portal do the orchestration of data placement • https://peerj.com/articles/cs-144/ - Modern Research Data Portal paper 12/20/201819
  • 20. Data And HPC: The Petascale DTN Project • Built on top of the Science DMZ • Effort to improve data transfer performance between the DOE ASCR HPC facilities at ANL, LBNL, and ORNL, and also NCSA. – Multiple current and future science projects need to transfer data between HPC facilities – Performance was slow, configurations inconsistent – Performance goal of 15 gigabits per second (equivalent to 1PB/week) – Realize performance goal for routine Globus transfers without special tuning • Reference data set is 4.4TB of cosmology simulation data • Use performant, easy-to-use tools with production options on – Globus Transfer service (previously Globus Online) – Use GUI just like a user would, with default options • E.g. integrity checksums enabled, as they should be • No arcane magic! 12/20/201820
  • 21. DTN Cluster Performance – HPC Facilities (2017) 21.2/22.6/24.5 Gbps 23.1/33.7/39.7 Gbps 26.7/34.7/39.9 Gbps 33.2/43.4/50.3 Gbps 35.9/39.0/40.7 Gbps 29.9/33.1/35.5 Gbps 34.6/47.5/56.8 Gbps 44.1/46.8/48.4 Gbps 41.0/42.2/43.9 Gbps 33.0/35.0/37.8 Gbps 43.0/50.0/56.3 Gbps 55.4/56.7/57.4 Gbps DTN DTN DTN DTN NERSC DTN cluster Globus endpoint: nersc#dtn Filesystem: /project Data set: L380 Files: 19260 Directories: 211 Other files: 0 Total bytes: 4442781786482 (4.4T bytes) Smallest file: 0 bytes (0 bytes) Largest file: 11313896248 bytes (11G bytes) Size distribution: 1 - 10 bytes: 7 files 10 - 100 bytes: 1 files 100 - 1K bytes: 59 files 1K - 10K bytes: 3170 files 10K - 100K bytes: 1560 files 100K - 1M bytes: 2817 files 1M - 10M bytes: 3901 files 10M - 100M bytes: 3800 files 100M - 1G bytes: 2295 files 1G - 10G bytes: 1647 files 10G - 100G bytes: 3 files Petascale DTN Project November 2017 L380 Data Set Gigabits per second (min/avg/max), three transfers ALCF DTN cluster Globus endpoint: alcf#dtn_mira Filesystem: /projects OLCF DTN cluster Globus endpoint: olcf#dtn_atlas Filesystem: atlas2 NCSA DTN cluster Globus endpoint: ncsa#BlueWaters Filesystem: /scratch 12/20/201821
  • 22. NCAR RDA Performance to DOE HPC Facilities 13.9 Gbps 16.6 Gbps 11.9 Gbps DTN nersc#dtn NERSC DTN olcf#dtn_atlas OLCF DTN alcf#dtn_mira ALCF DTN NCAR RDA rda#datashare 12/20/201822 • 1.5TB data set • 1121 files
  • 23. MRDP Partially Integrated Into ESGF 12/20/201823
  • 24. Modernized Cyberinfrastructure • This is an example of the capabilities of modern cyberinfrastructure – High speed networks – Science DMZ design pattern – Modern Research Data Portal design pattern – HPC facilities – High performance data platforms • Together these enable dramatically-improved data placement performance • Large-scale data analysis is now possible – Data from portals analyzed at supercomputer centers – Data shared between supercomputer centers 12/20/201824
  • 25. Larger Strategic Picture • Across the scientific community, larger structures are being built – HPC facilities combined with experiments – DTNs between campuses – These create the platform for future scientific discoveries. • Building DMZs, DTNs, and similar things for scientists puts the power of modern cyberinfrastructure in the hands of the people who will make the discoveries that change our world for the better. • By doing this work, we help bring about the future that we all want - better medicine, better technology, more energy, a cleaner environment, etc. 12/20/201825
  • 26. Long-Term Vision ESnet (Big Science facilities, DOE labs) Internet2 + Regionals (US Universities and affiliated institutions) International Networks (Universities and labs in Europe, Asia, Americas, Australia, etc.) High-performance feature-rich science network ecosystem Commercial Clouds (Amazon, Google, Microsoft, etc.) Agency Networks (NASA, NOAA, etc.) Campus HPC + Data 12/20/201826
  • 27. It’s All A Bunch Of Science DMZs High-performance feature-rich science network ecosystem DTN DTN DTN DTN DMZDMZ DMZDMZ DTN DATA DTN DMZDMZ DTN DTN DMZDMZ DATA DTN DTN DMZDMZ Parallel Filesystem DTN DTN DTN DTNDMZDMZ DATA DTN DTN DMZDMZ Experiment Data Archive DTN DTN DTN DTN DTN DMZDMZ DTN DTN DMZDMZ DATA 12/20/201827
  • 28. It’s All A Bunch Of Science DMZs High-performance feature-rich science network ecosystem DTN DTN DTN DTN DMZDMZ DMZDMZ DTN DATA DTN DMZDMZ DTN DTN DMZDMZ DATA DTN DTN DMZDMZ Parallel Filesystem DTN DTN DTN DTNDMZDMZ DATA DTN DTN DMZDMZ Experiment Data Archive DTN DTN DTN DTN DTN DMZDMZ DTN DTN DMZDMZ DATA HPC Facilities Single Lab Experiments Data Portal LHC Experiments University Computing 12/20/201828
  • 29. Thanks! Eli Dart - dart@es.net Energy Sciences Network (ESnet) Lawrence Berkeley National Laboratory engage@es.net http://my.es.net/ http://www.es.net/ http://fasterdata.es.net/
  • 30. Extra slides – data download from portal 12/20/201830
  • 31. ESnet - the basic facts: High-speed international networking facility, optimized for data-intensive science: • connecting 50 labs, plants and facilities with >150 networks, universities, research partners globally • supporting every science office, and serving as an integral extension of many instruments • 400Gbps transatlantic extension in production since Dec 2014 • >1.3 Tbps of external connectivity, including high speed access to commercial partners such as Amazon • growing number university connections to better serve LHC science (and eventually: Belle II) • older than commercial Internet, growing ~twice as fast Areas of strategic focus: software, science engagement. • Engagement effort now 12% of staff • Software capability critical to next-generation network 31
  • 32. Extra Slides – Science DMZ 12/20/201832
  • 33. Overview • Science DMZ Motivation and Introduction • Science DMZ Architecture • Network Monitoring For Performance • Data Transfer Nodes & Applications • Science Engagement 33 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 34. • Networks are an essential part of data-intensive science – Connect data sources to data analysis – Connect collaborators to each other – Enable machine-consumable interfaces to data and analysis resources (e.g. portals), automation, scale • Performance is critical – Exponential data growth – Constant human factors – Data movement and data analysis must keep up • Effective use of wide area (long-haul) networks by scientists has historically been difficult Motivation 34 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 35. The Central Role of the Network • The very structure of modern science assumes science networks exist: high performance, feature rich, global scope • What is “The Network” anyway? – “The Network” is the set of devices and applications involved in the use of a remote resource • This is not about supercomputer interconnects • This is about data flow from experiment to analysis, between facilities, etc. – User interfaces for “The Network” – portal, data transfer tool, workflow engine – Therefore, servers and applications must also be considered • What is important? Ordered list: 1. Correctness 2. Consistency 3. Performance 35 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 36. TCP – Ubiquitous and Fragile • Networks provide connectivity between hosts – how do hosts see the network? – From an application’s perspective, the interface to “the other end” is a socket – Communication is between applications – mostly over TCP • TCP – the fragile workhorse – TCP is (for very good reasons) timid – packet loss is interpreted as congestion – Packet loss in conjunction with latency is a performance killer – Like it or not, TCP is used for the vast majority of data transfer applications (more than 95% of ESnet traffic is TCP) 36 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 37. A small amount of packet loss makes a huge difference in TCP performance Metro Area Local (LAN) Regional Continental International Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss) With loss, high performance beyond metro distances is essentially impossible 37 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 38. Working With TCP In Practice • Far easier to support TCP than to fix TCP – People have been trying to fix TCP for years – limited success – Like it or not we’re stuck with TCP in the general case • Pragmatically speaking, we must accommodate TCP – Sufficient bandwidth to avoid congestion – Zero packet loss – Verifiable infrastructure • Networks are complex • Must be able to locate problems quickly • Small footprint is a huge win – small number of devices so that problem isolation is tractable © 2016, Energy Sciences Network 38 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 39. Putting A Solution Together • Effective support for TCP-based data transfer – Design for correct, consistent, high-performance operation – Design for ease of troubleshooting • Easy adoption is critical – Large laboratories and universities have extensive IT deployments – Drastic change is prohibitively difficult • Cybersecurity – defensible without compromising performance • Borrow ideas from traditional network security – Traditional DMZ • Separate enclave at network perimeter (“Demilitarized Zone”) • Specific location for external-facing services • Clean separation from internal network – Do the same thing for science – Science DMZ © 2016, Energy Sciences Network 39 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 40. Dedicated Systems for Data Transfer Network Architecture Performance Testing & Measurement Data Transfer Node • High performance • Configured specifically for data transfer • Proper tools Science DMZ • Dedicated network location for high-speed data resources • Appropriate security • Easy to deploy - no need to redesign the whole network perfSONAR • Enables fault isolation • Verify correct operation • Widely deployed in ESnet and other networks, as well as sites and facilities The Science DMZ Design Pattern © 2016, Energy Sciences Network 40 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 41. Abstract or Prototype Deployment • Add-on to existing network infrastructure – All that is required is a port on the border router – Small footprint, pre-production commitment • Easy to experiment with components and technologies – DTN prototyping – perfSONAR testing • Limited scope makes security policy exceptions easy – Only allow traffic from partners – Add-on to production infrastructure – lower risk 41 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 42. Science DMZ Design Pattern (Abstract) 10GE 10GE 10GE 10GE 10G BorderRouter WAN ScienceDMZ Switch/Router EnterpriseBorder Router/Firewall Site/Campus LAN Highperformance DataTransferNode withhigh-speedstorage Per-service securitypolicy controlpoints Clean, High-bandwidth WANpath Site/Campus accesstoScience DMZresources perfSONAR perfSONAR perfSONAR © 2016, Energy Sciences Network 42 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 43. Local And Wide Area Data Flows 10GE 10GE 10GE 10GE 10G Border Router WAN Science DMZ Switch/Router Enterprise Border Router/Firewall Site / Campus LAN High performance Data Transfer Node with high-speed storage Per-service security policy control points Clean, High-bandwidth WAN path Site / Campus access to Science DMZ resources perfSONAR perfSONAR High Latency WAN Path Low Latency LAN Path perfSONAR © 2016, Energy Sciences Network 43 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 44. Support For Multiple Projects • Science DMZ architecture allows multiple projects to put DTNs in place – Modular architecture – Centralized location for data servers • This may or may not work well depending on institutional politics – Issues such as physical security can make this a non-starter – On the other hand, some shops already have service models in place • On balance, this can provide a cost savings – it depends – Central support for data servers vs. carrying data flows – How far do the data flows have to go? © 2016, Energy Sciences Network 44 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 45. Multiple Projects 10GE 10GE 10GE 10G Border Router WAN Science DMZ Switch/Router Enterprise Border Router/Firewall Site / Campus LAN Project A DTN Per-project security policy control points Clean, High-bandwidth WAN path Site / Campus access to Science DMZ resources perfSONAR perfSONAR Project B DTN Project C DTN © 2016, Energy Sciences Network 45 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 46. Multiple Science DMZs – Dark Fiber Dark Fiber Dark Fiber 10GE Dark Fiber 10GE 10G Border Router WAN Science DMZ Switch/Routers Enterprise Border Router/Firewall Site / Campus LAN Project A DTN (building A) Per-project security policy perfSONAR perfSONAR Facility B DTN (building B) Cluster DTN (building C) perfSONARperfSONAR Cluster (building C) © 2016, Energy Sciences Network 46 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 47. Supercomputer Center Deployment • High-performance networking is assumed in this environment – Data flows between systems, between systems and storage, wide area, etc. – Global filesystem often ties resources together • Portions of this may not run over Ethernet (e.g. IB) • Implications for Data Transfer Nodes • “Science DMZ” may not look like a discrete entity here – By the time you get through interconnecting all the resources, you end up with most of the network in the Science DMZ – This is as it should be – the point is appropriate deployment of tools, configuration, policy control, etc. • Office networks can look like an afterthought, but they aren’t – Deployed with appropriate security controls – Office infrastructure need not be sized for science traffic © 2016, Energy Sciences Network 47 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 48. Supercomputer Center Virtual Circuit Routed Border Router WAN Core Switch/Router Firewall Offices perfSONAR perfSONAR perfSONAR Supercomputer Parallel Filesystem Front end switch Data Transfer Nodes Front end switch © 2016, Energy Sciences Network 48 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 49. Supercomputer Center Data Path Virtual Circuit Routed Border Router WAN Core Switch/Router Firewall Offices perfSONAR perfSONAR perfSONAR Supercomputer Parallel Filesystem Front end switch Data Transfer Nodes Front end switch High Latency WAN Path Low Latency LAN Path High Latency VC Path © 2016, Energy Sciences Network 49 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 50. Common Threads • Two common threads exist in all these examples • Accommodation of TCP – Wide area portion of data transfers traverses purpose-built path – High performance devices that don’t drop packets • Ability to test and verify – When problems arise (and they always will), they can be solved if the infrastructure is built correctly – Small device count makes it easier to find issues – Multiple test and measurement hosts provide multiple views of the data path • perfSONAR nodes at the site and in the WAN • perfSONAR nodes at the remote site © 2016, Energy Sciences Network 50 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 51. Overview • Science DMZ Motivation and Introduction • Science DMZ Architecture • Network Monitoring For Performance • Data Transfer Nodes & Applications • Science Engagement 51 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 52. Performance Monitoring • Everything may function perfectly when it is deployed • Eventually something is going to break – Networks and systems are complex – Bugs, mistakes, … – Sometimes things just break – this is why we buy support contracts • Must be able to find and fix problems when they occur • Must be able to find problems in other networks (your network may be fine, but someone else’s problem can impact your users) • TCP was intentionally designed to hide all transmission errors from the user: – “As long as the TCPs continue to function properly and the internet system does not become completely partitioned, no transmission errors will affect the users.” (From RFC793, 1981) © 2016, Energy Sciences Network 52 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 53. Soft Network Failures – Hidden Problems • Hard failures are well-understood – Link down, system crash, software crash – Routing protocols are designed to cope with hard failures (route around damage) – Traditional network/system monitoring tools designed to quickly find hard failures • Soft failures result in degraded capability – Connectivity exists – Performance impacted – Typically something in the path is functioning, but not well • Soft failures are hard to detect with traditional methods – No obvious single event – Sometimes no indication at all of any errors • Independent testing is the only way to reliably find soft failures © 2016, Energy Sciences Network 53 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 54. Rebooted router to fix forwarding table Gradual failure of optical line card Sample Soft Failures Gb/s normal performance degrading performance repair one month © 2016, Energy Sciences Network 54 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 55. Testing Infrastructure – perfSONAR • perfSONAR is: – A widely-deployed test and measurement infrastructure • ESnet, Internet2, US regional networks, international networks • Laboratories, supercomputer centers, universities • Individual Linux hosts at key network locations (POPs, Science DMZs, etc.) – A suite of test and measurement tools – A collaboration that builds and maintains the toolkit • By installing perfSONAR, a site can leverage over 1400 test servers deployed around the world • perfSONAR is ideal for finding soft failures – Alert to existence of problems – Fault isolation – Verification of correct operation © 2016, Energy Sciences Network 55 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 56. Lookup Service Directory Search: http://stats.es.net/ServicesDirectory/ © 2016, Energy Sciences Network56 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 57. perfSONAR Dashboard: http://ps- dashboard.es.net © 2016, Energy Sciences Network57 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 58. Overview • Science DMZ Motivation and Introduction • Science DMZ Architecture • Network Monitoring For Performance • Data Transfer Nodes & Applications • Science Engagement 58 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 59. Dedicated Systems – The Data Transfer Node • The DTN is dedicated to data transfer • Set up specifically for high-performance data movement – System internals (BIOS, firmware, interrupts, etc.) – Network stack – Storage (global filesystem, Fibrechannel, local RAID, etc.) – High performance tools – No extraneous software • Limitation of scope and function is powerful – No conflicts with configuration for other tasks – Small application set makes cybersecurity easier • Limitation of application set is often a core security policy component © 2016, Energy Sciences Network 59 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 60. Data Transfer Tools For DTNs • Parallelism is important – It is often easier to achieve a given performance level with four parallel connections than one connection – Several tools offer parallel transfers, including Globus/GridFTP • Latency interaction is critical – Wide area data transfers have much higher latency than LAN transfers – Many tools and protocols assume a LAN • Workflow integration is important • Key tools: Globus Online, HPN-SSH © 2016, Energy Sciences Network 60 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 61. Say NO to SCP (2016) • Using the right data transfer tool is very important • Sample Results: Berkeley, CA to Argonne, IL (near Chicago ) RTT = 53 ms, network capacity = 10Gbps. • Notes – scp is 24x slower than GridFTP on this path!! – to get more than 1 Gbps (125 MB/s) disk to disk requires RAID array. – (Assumes host TCP buffers are set correctly for the RTT) Tool Throughput scp 330 Mbps wget, GridFTP, FDT, 1 stream 6 Gbps GridFTP and FDT, 4 streams 8 Gbps (disk limited) 61 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 62. Overview • Science DMZ Motivation and Introduction • Science DMZ Architecture • Network Monitoring For Performance • Data Transfer Nodes & Applications • Science Engagement 62 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 63. The Data Transfer Superfecta: Science DMZ Model 63 – ESnet Science Engagement (engage@es.net) - Data Transfer Node • Configured for data transfer • High performance • Proper tools perfSONAR • Enables fault isolation • Verify correct operation • Widely deployed in ESnet and other networks, as well as sites and facilities Science DMZ • Dedicated location for DTN • Proper security • Easy to deploy - no need to redesign the whole network Engagement • Resources & Knowledgebase • Partnerships • Education & Consulting © 2016, Energy Sciences Network
  • 64. Context Setting • DOE, NSF, and other agencies are investing billions of dollars in state-of- the-art cyberinfrastructure to support data-intensive science. • Many researchers do not understand the value of these services and have difficulty using them. • A proactive effort is needed to drive adoption of advanced services and accelerate science output: Science Engagement 64 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 65. Science Engagement • The ESnet Science Engagement team's mission is to ensure that science collaborations at every scale, in every domain, have the information and tools they need to achieve maximum benefit from global networks through the creation of scalable, community-driven strategies and approaches. • Science Engagement team works in several areas at once – Understand key elements which contribute to desired outcomes • Requirements analysis – what is needed • Also identify choke points, road blocks, missing components – Network architecture, performance, best practice – Systems engineering, consulting, troubleshooting – Collaboration with others – Workshops and webinars 65 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 66. • The Science DMZ design pattern provides a flexible model for supporting high-performance data transfers and workflows • Key elements: – Accommodation of TCP • Sufficient bandwidth to avoid congestion • Loss-free IP service – Location – near the site perimeter if possible – Test and measurement – Dedicated systems – Appropriate security – Science Engagement to foster adoption Science DMZ Wrapup © 2016, Energy Sciences Network 66 – ESnet Science Engagement (engage@es.net) - 12/20/2018
  • 67. Links and Lists – ESnet fasterdata knowledge base • http://fasterdata.es.net/ – Science DMZ paper • http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf – Science DMZ email list • Send mail to sympa@lists.lbl.gov with subject "subscribe esnet-sciencedmz” – perfSONAR • http://fasterdata.es.net/performance-testing/perfsonar/ • http://www.perfsonar.net – Globus • https://www.globus.org/ 67 – ESnet Science Engagement (engage@es.net) - 12/20/2018 © 2016, Energy Sciences Network
  • 68. Context: Science DMZ Adoption • DOE National Laboratories – Supercomputer centers, LHC sites, experimental facilities – Both large and small sites • NSF CC* programs have funded many Science DMZs – Large investments across the US university complex: over $100M – Significant strategic importance • Outside the USA – Australia – Brazil – UK – More… 12/20/201868
  • 69. Strategic Impacts • What does this mean? – We are in the midst of a significant cyberinfrastructure upgrade – Enterprise networks need not be unduly perturbed  • Significantly enhanced capabilities compared to 5 years ago – Terabyte-scale data movement is much easier – Petabyte-scale data movement possible outside the LHC experiments • ~3.1Gbps = 1PB/month • ~14Gbps = 1PB/week – Widely-deployed tools are much better (e.g. Globus) • Metcalfe’s Law of Network Utility – Value of Science DMZ proportional to the number of DMZs • n2 or n(logn) doesn’t matter – the effect is real – Cyberinfrastructure value increases as we all upgrade 12/20/201869
  • 70. Next Steps – Building On The Science DMZ • Enhanced cyberinfrastructure substrate now exists – Wide area networks (ESnet, GEANT, NRENs, Internet2, Regionals) – Science DMZs connected to those networks – DTNs in the Science DMZs • What does the scientist see? – Scientist sees a science application • Data transfer • Data portal • Data analysis – Science applications are the user interface to networks and DMZs • Large-scale data-intensive science requires that we build science applications on top of the substrate components 12/20/201870