SlideShare une entreprise Scribd logo
1  sur  62
Trends from the Trenches
2019 Bio-IT World Conference
Chris Dagdigian
https://bioteam.net
Want these slides?
slideshare.net/chrisdag or
https://bioteam.net
Image by Deanna & Amy; used with permission
https://metoostem.com/
● Seems appropriate to include this
● Recurring 2019 theme for me has
been listening to the stories of
women forced early career
academic paths or jobs because of
systemic harassment & bias in
STEM fields
@chris_dag - https://bioteam.net
I’m Chris. I work for
BioTeam
● Failed scientist turned infrastructure nerd
● 20 years working on infrastructure for life
science research; Now I’m old & lame
● As a consultant I get to see how many
different groups of smart people tackle similar
challenges
● Often I’m allowed to talk about what I see so I
collect trends, observations and common pain
points
● Started talking at BioIT in 2010 and they
won’t let me gracefully retire
Thought Excretor
Magic Quadrant
Competence / Domain Insight
Can talk bluntly
in public
@hpc_guru
@fdmnts@glennklockwood
… you get
the idea
@{ many smart
people }
@{ vendor shills }
@mndoci
@chris_dag - https://bioteam.net
Tune Me Out or Filter My Words Accordingly
● Not a pundit
● Not a “thought leader”
● Not pretending to speak on behalf of our
huge and diverse industry
● This is a personal talk delivered through the
prism of prior work, clients, projects and
conversations
● Lots of industry/government work recently
● My observations have the same
diversity/inclusion problems as
science/workplaces in general
● Heavily influenced by past and current
projects and the interesting people I’ve
spoken or interacted with
2019 Catch-all: Observations, Anecdotes & Emergent Stuff
@chris_dag - https://bioteam.net
01: We’ve done OK entering “data intensive science” era
Turbulent for sure but we’ve managed
...
● Compute
○ Physical, virtual and cloud based computing
is a tractable problem at most scales
● Networking
○ > 10-Gbps still painful and expensive
○ Science DMZ design patterns are working
● Storage
○ Large capacity is a solved problem
○ Consumption rate still scary
One of the biggest unsolved problems
● Data Management, Discovery, Cataloging
and Classification
● It’s easy to store vast piles of data; we are still
terrible at understanding what we have
Question: How many vendors and
products did you see this week at BioIT’19
explicitly focusing on data management,
curation, metadata or discovery?
@chris_dag - https://bioteam.net
02: Scientific Computing: Still Undervalued By Leadership
Prior Talks / Younger Me
● “Computers are digital benchtops, not the
simple business process endpoints that
Enterprise IT treats them as”
● “HPC capability is essential for R&D; we
need leadership and investment parity with
the wetlab folk”
Today’s Talk / Older, Heftier & Wiser
● Still viewed as cost center to be minimized,
optimized and “value engineered”
● Only a few treat “extract insight & value
from data” as core competitive differentiator
beyond vapid words in mission statements
● HR not touting as major recruitment and
retention asset
Incompetence in this space is an existential survival threat to your company or organization.
@chris_dag - https://bioteam.net
02: Scientific Computing: Still Undervalued By Leadership
Incompetence in this space is an existential survival threat to your company or organization.
@chris_dag - https://bioteam.net
03: Scientific Computing: User Trends
User Base Climbing Rapidly
● HPC and analytic capabilities are
extending from discovery and spreading
across the enterprise
● Pervasive need for HPC and analytic
competence across the entire
organization
○ % of staff outgrowing “laptop scale”
analysis is climbing fast
○ Competitive differentiator
○ Recruitment/Retention resource
○ Survival requirement in 2019
Users getting MORE and LESS
Sophisticated
● Users forced away from laptop-scale methods
require significant training and onboarding
● Yet new hires and early-career recruits often
show up with prior HPC and cloud expertise
● In general we are still pretty bad at training,
best practice propagation and knowledge
transfer
○ Especially in helping “intermediate” level users
become experts
Past talks ...
Today ...
@chris_dag - https://bioteam.net
04: Definition of HPC Being Stretched In Extreme Ways
● “Beyond laptop scale” computing requirement is
becoming pervasive across organizations
● We are often the only multi-user / shared-service
entities with large scale compute, storage, memory,
GPU and visualization capabilities
● HPC in danger of becoming the dumping ground for
any problem that does not fit on a laptop
● Parasitic usage causes:
○ Infrastructure tuned & biased for generic workflows
○ Support org becomes even more overwhelmed
○ Angry users demanding high-touch support and
special accommodations for niche stuff
“Any analysis that can’t
run on a cheap leased
laptop must require
HPC”
-- league of bad mgmt
@chris_dag - https://bioteam.net
05: Compilers & Toolchains: Mini Trend?
Coming out of a relatively stable era
● Intel dominated compute
● Genomics/informatics dominated workload
● Hardware & software well characterized
● 10+ years since I had to mess with
commercial compilers
This may be changing …
● Ludicrous rate of innovation seen in the
instrument space is starting to appear in our
tooling & applications
● Now?
○ Software in rapid improve/innovate cycles
○ Kernels and kernel modules matter
○ Compiler & glibc versions matter
○ Conservative RHEL/CentOS Linux distributions
may be moving too slowly for some scientific
domains
● May be time to re-evaluate some of our
foundational environments & toolchains
@chris_dag - https://bioteam.net
05: Compilers & Toolchains: Mini Trend?
The latest/fastest CPUs are expensive.
GPUs are expensive.
NVLINK is expensive.
DGX-2 list price is $400,000/ea
A reasonable investment in compiler and toolchain optimizations could pay
significant dividends
@chris_dag - https://bioteam.net
06: Compilers & Toolchains : Relion CryoEM Homework
Try this at home, kids! (if you have CPUs with AVX-512 support)
● Download latest Relion codebase from https://github.com/3dem/relion
● Test #1
○ Build using stock compiler and developer tools with CPU acceleration enabled
○ Run & time “relion_refine” using the common benchmark data set and commands
● Test #2
○ Repeat build with upgraded compiler and developer tools (ie GCC-7 on CentOS/RHEL 7)
○ Time how long the run takes
● Test #3 (if possible)
○ Repeat work using Intel ICC compiler (Intel Parallel Studio)
○ Time how long the run takes
@chris_dag - https://bioteam.net
07: Call To Action - Bigger Relion CryoEM Benchmark Sets
● The most prevalent/popular Relion benchmark uses a ~50GB input data set
● Everyone appears to be using it; especially vendors trying to sell you stuff
○ Small enough to fit in RAM and hit the caching effect of almost all storage systems
○ We (BioTeam) do not believe this is a realistic test in 2019
■ … for anything other than getting compiler and CPU/GPU optimizations correct
● Seeking multi-terabyte CryoEM data organized for Relion 2D or 3D classification
○ We think the community needs MUCH larger benchmarking resources and data sets
○ We will happily host, share & re-distribute
○ We will publish our own results testing against this data
○ Contact chris@bioteam.net
How many scientists do you know with CryoEM experimental data sets that are less than 60GB in size?
@chris_dag - https://bioteam.net
08: Machine Learning & Training Data - Awsome and Ugly
● Proper ML/AI requires lots of training data
● Need “training” & “validation” sets
● The data engineering work is non-trivial
● Metadata is essential; bad data will sink you
● Competitors with “better” data will beat you
The race to acquire, generate or license
has the potential to be both awesome
and ugly
● Significant opportunities for both
innovation and abuse
Innovation Example
● We are starting to see organizations doing
really interesting things to acquire the
training data they need
● An Example:
Publish/Host useful tools on the cloud
○ Users get access to sophisticated analysis
resources they do not have locally
○ Opt-in data sharing process generates ...
○ 30,000 de-identified MRI scans per week
@chris_dag - https://bioteam.net
Topic: Lean Times & Resource Scarcity
@chris_dag - https://bioteam.net
"Unit cost of storage is decreasing but not as fast as data production
is increasing. Our computing costs grow ~10%/year while budget
grows at ~3% so we've had to cut [research] mission to preserve
essential capability "
-- Scientific leader @ nationally recognized institution
@chris_dag - https://bioteam.net
Lean Times: Prior Talks
● Cheaper to repeat experiment than store the data over full lifecycle
● Unit cost of storage out of sync with ease of data generation
● Petabytes of open access data easily available; & valid reasons to use it
● IT knows you haven’t touched that data in years
Also:
● Deleting raw and derived scientific data is OK
● Performing data triage is OK
● … as long as data deletion decisions are made by scientists, not IT
@chris_dag - https://bioteam.net
Lean Times: Today
● Data management still a source of existential dread for Bio-IT
● Core problem has seeped beyond “it is easier to acquire vast piles of
scientific data than it is to sensibly and safely store it over time”
● Today we see single scientists asking research questions that can totally
consume a leadership-class supercomputer or system like ANTON-2
For biotech/pharma this means our researchers can easily swamp any
system of any size or capability we can reasonably deliver. That is … not
sustainable.
@chris_dag - https://bioteam.net
Lean Times: What this could mean in coming years
● We stop half-assing governance in discovery-oriented Bio-IT?
● Scientific computing orgs tighten scope, scale & supported services
● HPC resource allocation explicitly under control of scientific leadership
○ Remember it is *never* appropriate for IT to make these types of decisions
● What about moonshots and open-ended research?
○ Maybe we adopt DOE/NSF national lab model and hand out
internal credits, grants or allocations for researchers to “spend”
however they see fit ...
@chris_dag - https://bioteam.net
Lean Times: Effective Operation Principals
Required:
● Governance driven by Science (not IT groups) becomes essential
● Honest & transparent operational cost data spanning cloud/on-prem
● Full transparency of usage and resource allocation metrics
● Good logging of scientific tools and codes being invoked
@chris_dag - https://bioteam.net
Lean Times: It’s not all doom and gloom
Talking to people who have lived this before:
● Forced hard examination of bespoke/custom/standalone systems (silos)
● Helped push for internal agreement and alignment re: adopting
common platforms, APIs and shared services/sysadmin operations
● “Made us think hard about how to run technological operations in a
different way”
@chris_dag - https://bioteam.net
Topic: Silicon Matters Again
@chris_dag - https://bioteam.net
Silicon Matters Again: Then & Now
Prior Talks / Younger Me
● “Compute is commodity”
● “Intel x86 rules the world”
● “GPU usage starting to differentiate between
visualization and MD/Chemistry”
Today’s Talk / Older, Larger & Wiser
● Ahh crap …
● CPUs, GPUs, FPGA’s and custom silicone
are back on the table again and it’s getting
messy
Bottom Line:
● Significantly more benchmark & eval work
● Developer preference vs. Cost/ROI analysis
● GIANT EXCEPTION
○ Serverless folk don’t care
@chris_dag - https://bioteam.net
Silicon Matters Again:
CPU
● AMD is back with EPYC
● … it’s benchmarking time again
GPUs
● Increasingly complicated landscape
● Needed for: VDI, Viz, MD/Chem/Structure,
ML/AI and CryoEM
● Pain points
○ Need different products (VDI vs. Science)
○ Need various GPU memory configs
○ Need various #s of GPUs per chassis
○ NVLINK - when, where & how much?
○ Will cloud have them when you need them?
TPUs, FPGAs & Custom Silicone
● Many trying to differentiate in ML/AI space
via custom devices
● Clouds now have proprietary accelerators
● More benchmarking required
● SDK/Framework decisions required
● Deeper engagement with IT required
@chris_dag - https://bioteam.net
Topic: Facility & Infrastructure
@chris_dag - https://bioteam.net
Facility & Infrastructure: General Observations
● Yes, you still have to do hybrid vs. cloud vs. on-prem analysis & math
● Economics still favor on-prem or colo for 24x7 scientific workloads
○ When other capability or business requirements don’t superseed cost concern
● Why?
○ Cloud-based on-demand elastic computing is easy and well understood
○ Serverless is effing transformative; both for capability and cost, but ...
○ … Persistent, accessible petascale cloud storage is still expensive month over month
○ At petascale egress fees start to matter
@chris_dag - https://bioteam.net
Definite Trend: Colocation Suites & Cabinets
● Seeing this actively in 2019
○ I’ve got an active on-prem to colo (Markley Group) project right now
● Sign of the times:
○ Steve Lister from Novartis is now CTO for HPC & Data Analytics @ Markley Group
● Drivers
○ Cost of new-build or upgrades to on-premise facilities
○ Poor cloud economics for certain 24x7 workloads and use cases
○ Growth, merger & consolidation activities
○ Colocation is the new “Network Hub” for
■ Offsite backup and data continuity efforts
■ Flexible aggregation of cloud connectivity (single cloud or multi-cloud)
■ Speciality links to partners, collaborators and high speed research networks (Internet2)
■ Bespoke IaaS, PaaS, SaaS offerings from colo operators
@chris_dag - https://bioteam.net
Story: “Innovation Space” Horror Show
@chris_dag - https://bioteam.net
Dumbest Thing I Saw in 2018
● Where: Boston, Massachusetts
● What: Shiny new top tier incubator/innovation space for life science startups
● Wow: Office, lab space, managed stockroom/chem service, etc.
● WTF:
○ No IT/infrastructure space for tenants. At all.
■ Big Data? Exotic instruments? Data intensive science? Eff you.
■ Don’t want to place a tower server deskside or in the wet lab? Eff you.
■ Shared internet circuit & firewall (logical tennant isolation & traffic shaping though)
○ Any changes require 3-party negotiations (Space Operator, Floor leaseholder, Building owners)
@chris_dag - https://bioteam.net
I’m not kidding - Dumbest thing I saw in 2018
● Brand new incubator space targeting life science startups in the middle of Boston
● … did a new facility build with the assumption that biotech/pharma startups need
nothing but laptops, 1Gig network drops and a bit of cloud + managed IT services
● Any physical IT infrastructure not owned by the space operator or managed IT
service provider has to live under-desk or inside the wet lab space
Yes. A subset of agile, fast-moving startups need nothing more than internet and a set of cloud-based wifi &
domain controllers. Building a new facility that caters only to these shops means you’ve guzzled a bit too
much telecom vendor and cloud marketing (or you fell asleep on a pile of “CIO Magazine”)
@chris_dag - https://bioteam.net
Topic: Org Charts & Scientific Support
@chris_dag - https://bioteam.net
Support: Data Intensive Life Science Is A Different Beast
Other HPC / Supercomputing :
● Modest set of dominant, well profiled & well
optimized domain-specific codes
● May have large user base or every extreme
HPC needs but the domain and application
landscape is approachable
In life science HPC …
● 5 - 5000 users
● 600+ applications spanning 10+ domains
○ Molecular Dynamics, Fluid Dynamics,
Structural Biology, Chemistry, Genomics,
Bioinformatics, Medical/Clinical, Optical
Imaging, EM Imaging, Sensor/IoT, etc. etc.
○ Each of these breaks down into specialities
typed by species, disease, organ, pathway etc.
etc.
@chris_dag - https://bioteam.net
Support: Data Intensive Life Science Is A Different Beast
I hate to bust out the “... but life science is SPECIAL and UNIQUE …” take
But … If you survey commodity supercomputing and capability
supercomputing environments at both small and “national lab” scale you will
see stark differences
Domain and workload diversity (and crap code) are our distinguishing
characteristics
@chris_dag - https://bioteam.net
Support: Data Intensive Life Science Is A Different Beast
Not a trend because org charts vary
wildly by mission & org but ...
● Domain expertise needs to spread to the
edge of the org while Scientific Computing
groups retain and grow the expertise that
spans groups/projects/orgs and domains
Domain-Specific Expertise:
● Embedded within the group, lab, program or
R&D organization
Cross-Domain/Cross-Org Expertise:
● Science Gateways, Portals, Middleware & APIs
● User & Workflow Optimization
● High Value Application Optimization
● Data Engineering
● Data Science & Analytics
● Data Visualization
● CUDA / ML / AI Expertise
● Training
Broadly useful cross-org capabilities get
consolidated within Scientific Computing; Exotic
domain expertise moves to stakeholder teams.
@chris_dag - https://bioteam.net
Model We Like: Service Oriented Delivery
● Large scientific computing shops
reorganize around scientific use
cases and end-user requirements
● … not on technological expertise
● Great way to blow away traditional IT
silos
● “Team of teams” approach to service
delivery
@chris_dag - https://bioteam.net
Topic: Storage
@chris_dag - https://bioteam.net
Storage Landscape: Prior Talks
● Everyone needs peta-capable storage
● { insert scary growth of storage graph } OMG OMG OMG
● In tough times it is OK to favor storage capacity over performance
● Scale-out NAS is best storage platform for most
● Parallel File Systems when workload requires due to higher ops
burden
● Object Storage is the future of scientific data at rest
● It is easier to generate/acquire vast piles of data than it is to safely and
sensibly store and manage it over a full lifecycle - this is a big problem
Legacy Dag
@chris_dag - https://bioteam.net
Storage Landscape: Today [1]
Major fundamental changes
● The capacity|performance calculus has swung the other way
● We now need very fast storage to handle machine learning, AI and
image-based workflow requirements
● ML training & validation requires persistent access to the “old” data so
we still need massive storage capacity
● Dominant file type at the moment is image-based, no longer genomes
Current
Dag
@chris_dag - https://bioteam.net
Storage Landscape: Today [2]
Contributing Factors
● Lots of deployed storage is nearing EOL or end of support contract
● Some really interesting next-gen storage companies have launched
● Parallel storage is a lot more attractive w/ performance as key driver
● Operational benefits of scale-out NAS slightly less valuable in context
Current
Dag
@chris_dag - https://bioteam.net
Storage Landscape: Data As Currency
● Your organization has a big problem when the default stance among
leadership or users is “all our data is important” - way worse than “we
don’t know how to figure out what is important …”
● Not understanding the true value of data leads to hoarding, massive
inefficiencies and inability to properly leverage the data at hand
● Data management, scientifically-relevant metadata, tracking the use,
derivative uses, and amount of repeated uses of data could totally
change how we approach scientific data storage
● It's about the data, not the storage platform
Current
Dag
@chris_dag - https://bioteam.net
Storage Tiering & Namespaces: Prior Talks
● Single namespace storage is really important
● If we don’t give users a single view of storage we end up with:
○ Multiple islands of data
○ Scientists store the same data in N different locations
○ Nasty data location and data provenance issues
● Seamless tiering within the namespace is desirable if possible
Legacy Dag
@chris_dag - https://bioteam.net
Storage Tiering & Namespaces: Today [1]
● We've done a bad job at encouraging active data handling
● Data is currency; IT training focuses on "spend wisely" not "manage
effectively"
● This is a multi-partner, multi-platform, multi-cloud world
● Global data protection methods hedge against disaster but not
personal/group/lab/publication needs
● Still inappropriate for IT to make data classification decisions
Current
Dag
@chris_dag - https://bioteam.net
Storage Tiering & Namespaces: Today [2]
Current
Dag
My biggest attitude change:
IT attempts to make seamless namespaces and automatic tiering
generally have failed to meet expectations; also hard to do
efficiently or without researcher input anyway
We need to place data management responsibilities back onto
the end-users*
Users whining about having to move/manage data when their
career and publications are based on “data intensive science” will
no longer be coddled. SCIENTIFIC DATA IS YOUR JOB.
@chris_dag - https://bioteam.net
Storage Tiering & Namespaces: Today [4]
* Some Exceptions:
● There ARE data management tasks that are a waste of time and skill
for highly trained scientists
● Biggest example: large-scale physical data ingest and export - scientists
should not be dealing with portable hard drives beyond a certain scale
● Large-scale physical data movements needs written SOPs and a
process owned by IT
Current
Dag
@chris_dag - https://bioteam.net
Storage Tiering & Namespaces: Today [3]
New social contract between IT and Users
IT Provides:
● Storage that meets business and scientific requirements
○ Including scratch, active, nearline, archive and object
○ Durable, available and reliable
● Metrics, monitoring, reporting and tools that enable user self-service
End User Responsibilities:
● Users responsible for scientific data management through full lifecycle
○ Including classifying, curating, organizing and moving it
Current
Dag
@chris_dag - https://bioteam.net
Storage: What this all means
● The new requirements for speed + capacity is deeply scary
● Image workloads and ML/AI mean we can’t trade away performance in
exchange for larger capacity any more
● Enterprise IT has more justification to transition platforms:
○ Conservative shops can buy the faster flash-powered levels of Scale-out NAS
○ Conservative shops can go IBM Spectrum Scale (managed GPFs)
○ Forward-looking shops will bring in new platforms and vendors
○ BeeGFS, Ceph & Lustre will find new audiences
● I’m cool with tiers, namespaces and making end-users more responsible
Current
Dag
@chris_dag - https://bioteam.net
Storage: Interesting Players
Metadata, Discovery, Data Protection
● Starfish Storage, https://starfishstorage.com/
● Atavium, https://www.atavium.com/
● Arcitecta, https://www.arcitecta.com/
● Igneous, https://www.igneous.io/
Next-Gen / Flash Storage Architectures
● VAST Data, https://www.vastdata.com/
● WekaIO, https://www.weka.io/
● Pure Storage, https://www.purestorage.com/
Current
Dag
@chris_dag - https://bioteam.net
Storage: Interesting Players, Continued
Data Movement
● Globus, https://www.globus.org/
● DataDobi, https://datadobi.com/
● Zettar, https://www.zettar.com/
Current
Dag
@chris_dag - https://bioteam.net
Topic: Networking
@chris_dag - https://bioteam.net
Networking: Still the #1 hassle but little change since 2018
Still the #1 IT infrastructure problem in data intensive life science
● Still have trouble moving scientific data at scale across networks
● We still lag in deploying 40-gig and 100-gig networking
● Enterprise IT still focusing on datacenter rather than edge & lab
● We still need to separate business network traffic from science data
traffic using Science DMZ design patterns
● Our connections to the Internet and Cloud are still too small
● Our firewalls and security controls are still designed for business
traffic and not monster “elephant” flows
● Biggest new thing was Nvidia purchasing Mellanox !
Past &
current!
@chris_dag - https://bioteam.net
Topic: Cloud
@chris_dag - https://bioteam.net
Cloud: Meta issues still the same but some changes ...
Past &
current!
Consistent message for 10 years now
● Cloud is a capability play for
life science research organizations
● Saving money is not the primary driver*
@chris_dag - https://bioteam.net
Cloud: Meta issues still the same but some changes ...
* About that “not a cost saving thing” message …
● Serverless Computing is transformational for capability
● Serverless Computing is transformational for cost
Read this:
https://rise.cs.berkeley.edu/blog/a-berkeley-view-on-serverless-computing/
Search engine shortcut: “berkeley view on serverless 2019”
● Primary caveat is that discovery oriented science still relies heavily on
interactive human efforts with bespoke tooling. A large chunk of our Bio-IT
landscape cannot be codified into APIs & service mesh architectures
@chris_dag - https://bioteam.net
Cloud: Meta issues still the same but some changes ...
● Microsoft acquisition of Cycle Computing is really starting to become
apparent on Azure Cloud - lots of interesting HPC and storage
offerings
● Cloud efforts to build bespoke accelerated hardware for AI/ML and
inference is of some concern. What used to be a simple cost or
capability eval now will require deep IT interaction with end-users to
learn their preferences and needs for SDKs, frameworks and tooling
● Scarcity of GPU resources on AWS has been a consistent trend across
multiple projects. We can’t get them at all, let alone within a placement
group!
@chris_dag - https://bioteam.net
Wrapping Up
@chris_dag - https://bioteam.net
Recap - Bottom Line 2019 Summary
1. Unit cost of storage vs. consumption rate
will force hard choices and new governance
2. Data discovery, management, curation and
movement are still major concerns
3. Storage selection pendulum has moved in a
big way. We now have to be BIG and FAST.
This will have a major impact
4. Responsibility for scientific data
management must rely with end-user and
not IT
5. Compilers, toolchains and silicon matter
again; it’s time to resurrect the benchmark
and eval crew
6. Science users can now swamp systems of any
scale with valid research questions; expect
governance and service scope constraints to
become more prevalent
7. Colo Facilities are being used more often
8. Life Science stands apart in the HPC and
supercomputing worlds for the sheer size
and diversity of our domains and workloads
@chris_dag - https://bioteam.net
Crowdsourcing thanks!
Sincere thanks to the folk who responded online
with comments and suggestions.
Including:
● Philippe Neron
● Matthew Trunnell
● Glenn Lockwood
● Tim Cutts
● Nick Weber
● Tom Bolton
● Dirk Petersen
● Gregg TeHennepe
● Eduardo Zaborowski
● Remy Evard
● Tom Plasterer
● Joe Stanganelli
● Jason Tetrault
2020 is the 10-year BioIT World
anniversary! The conference organizers are
very interested in what you’d like to see
and hear to make next year very special.
End; Thanks!;
Want these slides?
slideshare.net/chrisdag or
https://bioteam.net
Portrait commissioned from the artist who did the
illustrations for the “Heroines of JavaScript Trading
Cards”.
Want your own?
https://twitter.com/mirlu_exe

Contenu connexe

Tendances

2015 Bio-IT Trends From the Trenches
2015 Bio-IT Trends From the Trenches2015 Bio-IT Trends From the Trenches
2015 Bio-IT Trends From the TrenchesChris Dagdigian
 
BioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology ExchangeBioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology ExchangeChris Dagdigian
 
BioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesBioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&DChris Dagdigian
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingChris Dagdigian
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedInside Analysis
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Microsoft cloud migration and modernization playbook 031819 (1) (2)
Microsoft cloud migration and modernization playbook 031819 (1) (2)Microsoft cloud migration and modernization playbook 031819 (1) (2)
Microsoft cloud migration and modernization playbook 031819 (1) (2)didicadoida
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Chris Dagdigian
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...Dana Gardner
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Guido Schmutz
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Edward Curry
 
Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Guido Schmutz
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...ijcseit
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 

Tendances (20)

2015 Bio-IT Trends From the Trenches
2015 Bio-IT Trends From the Trenches2015 Bio-IT Trends From the Trenches
2015 Bio-IT Trends From the Trenches
 
BioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology ExchangeBioIT Trends - 2014 Internet2 Technology Exchange
BioIT Trends - 2014 Internet2 Technology Exchange
 
BioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the TrenchesBioIT World 2016 - HPC Trends from the Trenches
BioIT World 2016 - HPC Trends from the Trenches
 
Cloud Security for Life Science R&D
Cloud Security for Life Science R&DCloud Security for Life Science R&D
Cloud Security for Life Science R&D
 
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome MeetingBio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
 
Agents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has ArrivedAgents for Agility - The Just-in-Time Enterprise Has Arrived
Agents for Agility - The Just-in-Time Enterprise Has Arrived
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Microsoft cloud migration and modernization playbook 031819 (1) (2)
Microsoft cloud migration and modernization playbook 031819 (1) (2)Microsoft cloud migration and modernization playbook 031819 (1) (2)
Microsoft cloud migration and modernization playbook 031819 (1) (2)
 
Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)Trends from the Trenches (Singapore Edition)
Trends from the Trenches (Singapore Edition)
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...How Global Data Availability Accelerates Collaboration And Delivers Business ...
How Global Data Availability Accelerates Collaboration And Delivers Business ...
 
Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?Big Data and Fast Data - big and fast combined, is it possible?
Big Data and Fast Data - big and fast combined, is it possible?
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
International Journal of Computer Science, Engineering and Information Techn...
International Journal of Computer Science, Engineering and  Information Techn...International Journal of Computer Science, Engineering and  Information Techn...
International Journal of Computer Science, Engineering and Information Techn...
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 

Similaire à Trends from the Trenches: 2019

Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teamsVenkatesh Umaashankar
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringJames Densmore
 
Interview preparation full_stack_java
Interview preparation full_stack_javaInterview preparation full_stack_java
Interview preparation full_stack_javaMallikarjuna G D
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryAli Dasdan
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdfAyele40
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data CentersGina Buck
 
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareWebinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareStorage Switzerland
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedChris Dagdigian
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Taming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureTaming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and publicStephanie Simms
 
1 data science with python
1 data science with python1 data science with python
1 data science with pythonVishal Sathawane
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...LavaConConference
 

Similaire à Trends from the Trenches: 2019 (20)

Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
Enabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data EngineeringEnabling Your Data Science Team with Modern Data Engineering
Enabling Your Data Science Team with Modern Data Engineering
 
Interview preparation full_stack_java
Interview preparation full_stack_javaInterview preparation full_stack_java
Interview preparation full_stack_java
 
How to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st centuryHow to build and run a big data platform in the 21st century
How to build and run a big data platform in the 21st century
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
The Growth Of Data Centers
The Growth Of Data CentersThe Growth Of Data Centers
The Growth Of Data Centers
 
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage NightmareWebinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
Webinar: Flash to Flash to Cloud – Three Steps to Ending the Storage Nightmare
 
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons LearnedBio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
Bio-IT Asia 2013: Informatics & Cloud - Best Practices & Lessons Learned
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Taming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged InfrastructureTaming Big Science Data Growth with Converged Infrastructure
Taming Big Science Data Growth with Converged Infrastructure
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
 
Making DMPs actionable and public
Making DMPs actionable and publicMaking DMPs actionable and public
Making DMPs actionable and public
 
1 data science with python
1 data science with python1 data science with python
1 data science with python
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
 

Plus de Chris Dagdigian

2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersChris Dagdigian
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchChris Dagdigian
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersChris Dagdigian
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationChris Dagdigian
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 

Plus de Chris Dagdigian (6)

2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation2014 BioIT World - Trends from the trenches - Annual presentation
2014 BioIT World - Trends from the trenches - Annual presentation
 
Multi-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC ClustersMulti-Tenant Pharma HPC Clusters
Multi-Tenant Pharma HPC Clusters
 
AWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating ResearchAWS re:Invent - Accelerating Research
AWS re:Invent - Accelerating Research
 
Bio-IT for Core Facility Managers
Bio-IT for Core Facility ManagersBio-IT for Core Facility Managers
Bio-IT for Core Facility Managers
 
Practical Cloud & Workflow Orchestration
Practical Cloud & Workflow OrchestrationPractical Cloud & Workflow Orchestration
Practical Cloud & Workflow Orchestration
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 

Dernier

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Trends from the Trenches: 2019

  • 1. Trends from the Trenches 2019 Bio-IT World Conference Chris Dagdigian https://bioteam.net
  • 3. Image by Deanna & Amy; used with permission https://metoostem.com/ ● Seems appropriate to include this ● Recurring 2019 theme for me has been listening to the stories of women forced early career academic paths or jobs because of systemic harassment & bias in STEM fields
  • 4. @chris_dag - https://bioteam.net I’m Chris. I work for BioTeam ● Failed scientist turned infrastructure nerd ● 20 years working on infrastructure for life science research; Now I’m old & lame ● As a consultant I get to see how many different groups of smart people tackle similar challenges ● Often I’m allowed to talk about what I see so I collect trends, observations and common pain points ● Started talking at BioIT in 2010 and they won’t let me gracefully retire
  • 5. Thought Excretor Magic Quadrant Competence / Domain Insight Can talk bluntly in public @hpc_guru @fdmnts@glennklockwood … you get the idea @{ many smart people } @{ vendor shills } @mndoci
  • 6. @chris_dag - https://bioteam.net Tune Me Out or Filter My Words Accordingly ● Not a pundit ● Not a “thought leader” ● Not pretending to speak on behalf of our huge and diverse industry ● This is a personal talk delivered through the prism of prior work, clients, projects and conversations ● Lots of industry/government work recently ● My observations have the same diversity/inclusion problems as science/workplaces in general ● Heavily influenced by past and current projects and the interesting people I’ve spoken or interacted with
  • 7. 2019 Catch-all: Observations, Anecdotes & Emergent Stuff
  • 8. @chris_dag - https://bioteam.net 01: We’ve done OK entering “data intensive science” era Turbulent for sure but we’ve managed ... ● Compute ○ Physical, virtual and cloud based computing is a tractable problem at most scales ● Networking ○ > 10-Gbps still painful and expensive ○ Science DMZ design patterns are working ● Storage ○ Large capacity is a solved problem ○ Consumption rate still scary One of the biggest unsolved problems ● Data Management, Discovery, Cataloging and Classification ● It’s easy to store vast piles of data; we are still terrible at understanding what we have Question: How many vendors and products did you see this week at BioIT’19 explicitly focusing on data management, curation, metadata or discovery?
  • 9. @chris_dag - https://bioteam.net 02: Scientific Computing: Still Undervalued By Leadership Prior Talks / Younger Me ● “Computers are digital benchtops, not the simple business process endpoints that Enterprise IT treats them as” ● “HPC capability is essential for R&D; we need leadership and investment parity with the wetlab folk” Today’s Talk / Older, Heftier & Wiser ● Still viewed as cost center to be minimized, optimized and “value engineered” ● Only a few treat “extract insight & value from data” as core competitive differentiator beyond vapid words in mission statements ● HR not touting as major recruitment and retention asset Incompetence in this space is an existential survival threat to your company or organization.
  • 10. @chris_dag - https://bioteam.net 02: Scientific Computing: Still Undervalued By Leadership Incompetence in this space is an existential survival threat to your company or organization.
  • 11. @chris_dag - https://bioteam.net 03: Scientific Computing: User Trends User Base Climbing Rapidly ● HPC and analytic capabilities are extending from discovery and spreading across the enterprise ● Pervasive need for HPC and analytic competence across the entire organization ○ % of staff outgrowing “laptop scale” analysis is climbing fast ○ Competitive differentiator ○ Recruitment/Retention resource ○ Survival requirement in 2019 Users getting MORE and LESS Sophisticated ● Users forced away from laptop-scale methods require significant training and onboarding ● Yet new hires and early-career recruits often show up with prior HPC and cloud expertise ● In general we are still pretty bad at training, best practice propagation and knowledge transfer ○ Especially in helping “intermediate” level users become experts Past talks ... Today ...
  • 12. @chris_dag - https://bioteam.net 04: Definition of HPC Being Stretched In Extreme Ways ● “Beyond laptop scale” computing requirement is becoming pervasive across organizations ● We are often the only multi-user / shared-service entities with large scale compute, storage, memory, GPU and visualization capabilities ● HPC in danger of becoming the dumping ground for any problem that does not fit on a laptop ● Parasitic usage causes: ○ Infrastructure tuned & biased for generic workflows ○ Support org becomes even more overwhelmed ○ Angry users demanding high-touch support and special accommodations for niche stuff “Any analysis that can’t run on a cheap leased laptop must require HPC” -- league of bad mgmt
  • 13. @chris_dag - https://bioteam.net 05: Compilers & Toolchains: Mini Trend? Coming out of a relatively stable era ● Intel dominated compute ● Genomics/informatics dominated workload ● Hardware & software well characterized ● 10+ years since I had to mess with commercial compilers This may be changing … ● Ludicrous rate of innovation seen in the instrument space is starting to appear in our tooling & applications ● Now? ○ Software in rapid improve/innovate cycles ○ Kernels and kernel modules matter ○ Compiler & glibc versions matter ○ Conservative RHEL/CentOS Linux distributions may be moving too slowly for some scientific domains ● May be time to re-evaluate some of our foundational environments & toolchains
  • 14. @chris_dag - https://bioteam.net 05: Compilers & Toolchains: Mini Trend? The latest/fastest CPUs are expensive. GPUs are expensive. NVLINK is expensive. DGX-2 list price is $400,000/ea A reasonable investment in compiler and toolchain optimizations could pay significant dividends
  • 15. @chris_dag - https://bioteam.net 06: Compilers & Toolchains : Relion CryoEM Homework Try this at home, kids! (if you have CPUs with AVX-512 support) ● Download latest Relion codebase from https://github.com/3dem/relion ● Test #1 ○ Build using stock compiler and developer tools with CPU acceleration enabled ○ Run & time “relion_refine” using the common benchmark data set and commands ● Test #2 ○ Repeat build with upgraded compiler and developer tools (ie GCC-7 on CentOS/RHEL 7) ○ Time how long the run takes ● Test #3 (if possible) ○ Repeat work using Intel ICC compiler (Intel Parallel Studio) ○ Time how long the run takes
  • 16. @chris_dag - https://bioteam.net 07: Call To Action - Bigger Relion CryoEM Benchmark Sets ● The most prevalent/popular Relion benchmark uses a ~50GB input data set ● Everyone appears to be using it; especially vendors trying to sell you stuff ○ Small enough to fit in RAM and hit the caching effect of almost all storage systems ○ We (BioTeam) do not believe this is a realistic test in 2019 ■ … for anything other than getting compiler and CPU/GPU optimizations correct ● Seeking multi-terabyte CryoEM data organized for Relion 2D or 3D classification ○ We think the community needs MUCH larger benchmarking resources and data sets ○ We will happily host, share & re-distribute ○ We will publish our own results testing against this data ○ Contact chris@bioteam.net How many scientists do you know with CryoEM experimental data sets that are less than 60GB in size?
  • 17. @chris_dag - https://bioteam.net 08: Machine Learning & Training Data - Awsome and Ugly ● Proper ML/AI requires lots of training data ● Need “training” & “validation” sets ● The data engineering work is non-trivial ● Metadata is essential; bad data will sink you ● Competitors with “better” data will beat you The race to acquire, generate or license has the potential to be both awesome and ugly ● Significant opportunities for both innovation and abuse Innovation Example ● We are starting to see organizations doing really interesting things to acquire the training data they need ● An Example: Publish/Host useful tools on the cloud ○ Users get access to sophisticated analysis resources they do not have locally ○ Opt-in data sharing process generates ... ○ 30,000 de-identified MRI scans per week
  • 18. @chris_dag - https://bioteam.net Topic: Lean Times & Resource Scarcity
  • 19. @chris_dag - https://bioteam.net "Unit cost of storage is decreasing but not as fast as data production is increasing. Our computing costs grow ~10%/year while budget grows at ~3% so we've had to cut [research] mission to preserve essential capability " -- Scientific leader @ nationally recognized institution
  • 20. @chris_dag - https://bioteam.net Lean Times: Prior Talks ● Cheaper to repeat experiment than store the data over full lifecycle ● Unit cost of storage out of sync with ease of data generation ● Petabytes of open access data easily available; & valid reasons to use it ● IT knows you haven’t touched that data in years Also: ● Deleting raw and derived scientific data is OK ● Performing data triage is OK ● … as long as data deletion decisions are made by scientists, not IT
  • 21. @chris_dag - https://bioteam.net Lean Times: Today ● Data management still a source of existential dread for Bio-IT ● Core problem has seeped beyond “it is easier to acquire vast piles of scientific data than it is to sensibly and safely store it over time” ● Today we see single scientists asking research questions that can totally consume a leadership-class supercomputer or system like ANTON-2 For biotech/pharma this means our researchers can easily swamp any system of any size or capability we can reasonably deliver. That is … not sustainable.
  • 22. @chris_dag - https://bioteam.net Lean Times: What this could mean in coming years ● We stop half-assing governance in discovery-oriented Bio-IT? ● Scientific computing orgs tighten scope, scale & supported services ● HPC resource allocation explicitly under control of scientific leadership ○ Remember it is *never* appropriate for IT to make these types of decisions ● What about moonshots and open-ended research? ○ Maybe we adopt DOE/NSF national lab model and hand out internal credits, grants or allocations for researchers to “spend” however they see fit ...
  • 23. @chris_dag - https://bioteam.net Lean Times: Effective Operation Principals Required: ● Governance driven by Science (not IT groups) becomes essential ● Honest & transparent operational cost data spanning cloud/on-prem ● Full transparency of usage and resource allocation metrics ● Good logging of scientific tools and codes being invoked
  • 24. @chris_dag - https://bioteam.net Lean Times: It’s not all doom and gloom Talking to people who have lived this before: ● Forced hard examination of bespoke/custom/standalone systems (silos) ● Helped push for internal agreement and alignment re: adopting common platforms, APIs and shared services/sysadmin operations ● “Made us think hard about how to run technological operations in a different way”
  • 26. @chris_dag - https://bioteam.net Silicon Matters Again: Then & Now Prior Talks / Younger Me ● “Compute is commodity” ● “Intel x86 rules the world” ● “GPU usage starting to differentiate between visualization and MD/Chemistry” Today’s Talk / Older, Larger & Wiser ● Ahh crap … ● CPUs, GPUs, FPGA’s and custom silicone are back on the table again and it’s getting messy Bottom Line: ● Significantly more benchmark & eval work ● Developer preference vs. Cost/ROI analysis ● GIANT EXCEPTION ○ Serverless folk don’t care
  • 27. @chris_dag - https://bioteam.net Silicon Matters Again: CPU ● AMD is back with EPYC ● … it’s benchmarking time again GPUs ● Increasingly complicated landscape ● Needed for: VDI, Viz, MD/Chem/Structure, ML/AI and CryoEM ● Pain points ○ Need different products (VDI vs. Science) ○ Need various GPU memory configs ○ Need various #s of GPUs per chassis ○ NVLINK - when, where & how much? ○ Will cloud have them when you need them? TPUs, FPGAs & Custom Silicone ● Many trying to differentiate in ML/AI space via custom devices ● Clouds now have proprietary accelerators ● More benchmarking required ● SDK/Framework decisions required ● Deeper engagement with IT required
  • 28. @chris_dag - https://bioteam.net Topic: Facility & Infrastructure
  • 29. @chris_dag - https://bioteam.net Facility & Infrastructure: General Observations ● Yes, you still have to do hybrid vs. cloud vs. on-prem analysis & math ● Economics still favor on-prem or colo for 24x7 scientific workloads ○ When other capability or business requirements don’t superseed cost concern ● Why? ○ Cloud-based on-demand elastic computing is easy and well understood ○ Serverless is effing transformative; both for capability and cost, but ... ○ … Persistent, accessible petascale cloud storage is still expensive month over month ○ At petascale egress fees start to matter
  • 30. @chris_dag - https://bioteam.net Definite Trend: Colocation Suites & Cabinets ● Seeing this actively in 2019 ○ I’ve got an active on-prem to colo (Markley Group) project right now ● Sign of the times: ○ Steve Lister from Novartis is now CTO for HPC & Data Analytics @ Markley Group ● Drivers ○ Cost of new-build or upgrades to on-premise facilities ○ Poor cloud economics for certain 24x7 workloads and use cases ○ Growth, merger & consolidation activities ○ Colocation is the new “Network Hub” for ■ Offsite backup and data continuity efforts ■ Flexible aggregation of cloud connectivity (single cloud or multi-cloud) ■ Speciality links to partners, collaborators and high speed research networks (Internet2) ■ Bespoke IaaS, PaaS, SaaS offerings from colo operators
  • 31. @chris_dag - https://bioteam.net Story: “Innovation Space” Horror Show
  • 32. @chris_dag - https://bioteam.net Dumbest Thing I Saw in 2018 ● Where: Boston, Massachusetts ● What: Shiny new top tier incubator/innovation space for life science startups ● Wow: Office, lab space, managed stockroom/chem service, etc. ● WTF: ○ No IT/infrastructure space for tenants. At all. ■ Big Data? Exotic instruments? Data intensive science? Eff you. ■ Don’t want to place a tower server deskside or in the wet lab? Eff you. ■ Shared internet circuit & firewall (logical tennant isolation & traffic shaping though) ○ Any changes require 3-party negotiations (Space Operator, Floor leaseholder, Building owners)
  • 33. @chris_dag - https://bioteam.net I’m not kidding - Dumbest thing I saw in 2018 ● Brand new incubator space targeting life science startups in the middle of Boston ● … did a new facility build with the assumption that biotech/pharma startups need nothing but laptops, 1Gig network drops and a bit of cloud + managed IT services ● Any physical IT infrastructure not owned by the space operator or managed IT service provider has to live under-desk or inside the wet lab space Yes. A subset of agile, fast-moving startups need nothing more than internet and a set of cloud-based wifi & domain controllers. Building a new facility that caters only to these shops means you’ve guzzled a bit too much telecom vendor and cloud marketing (or you fell asleep on a pile of “CIO Magazine”)
  • 34. @chris_dag - https://bioteam.net Topic: Org Charts & Scientific Support
  • 35. @chris_dag - https://bioteam.net Support: Data Intensive Life Science Is A Different Beast Other HPC / Supercomputing : ● Modest set of dominant, well profiled & well optimized domain-specific codes ● May have large user base or every extreme HPC needs but the domain and application landscape is approachable In life science HPC … ● 5 - 5000 users ● 600+ applications spanning 10+ domains ○ Molecular Dynamics, Fluid Dynamics, Structural Biology, Chemistry, Genomics, Bioinformatics, Medical/Clinical, Optical Imaging, EM Imaging, Sensor/IoT, etc. etc. ○ Each of these breaks down into specialities typed by species, disease, organ, pathway etc. etc.
  • 36. @chris_dag - https://bioteam.net Support: Data Intensive Life Science Is A Different Beast I hate to bust out the “... but life science is SPECIAL and UNIQUE …” take But … If you survey commodity supercomputing and capability supercomputing environments at both small and “national lab” scale you will see stark differences Domain and workload diversity (and crap code) are our distinguishing characteristics
  • 37. @chris_dag - https://bioteam.net Support: Data Intensive Life Science Is A Different Beast Not a trend because org charts vary wildly by mission & org but ... ● Domain expertise needs to spread to the edge of the org while Scientific Computing groups retain and grow the expertise that spans groups/projects/orgs and domains Domain-Specific Expertise: ● Embedded within the group, lab, program or R&D organization Cross-Domain/Cross-Org Expertise: ● Science Gateways, Portals, Middleware & APIs ● User & Workflow Optimization ● High Value Application Optimization ● Data Engineering ● Data Science & Analytics ● Data Visualization ● CUDA / ML / AI Expertise ● Training Broadly useful cross-org capabilities get consolidated within Scientific Computing; Exotic domain expertise moves to stakeholder teams.
  • 38. @chris_dag - https://bioteam.net Model We Like: Service Oriented Delivery ● Large scientific computing shops reorganize around scientific use cases and end-user requirements ● … not on technological expertise ● Great way to blow away traditional IT silos ● “Team of teams” approach to service delivery
  • 40. @chris_dag - https://bioteam.net Storage Landscape: Prior Talks ● Everyone needs peta-capable storage ● { insert scary growth of storage graph } OMG OMG OMG ● In tough times it is OK to favor storage capacity over performance ● Scale-out NAS is best storage platform for most ● Parallel File Systems when workload requires due to higher ops burden ● Object Storage is the future of scientific data at rest ● It is easier to generate/acquire vast piles of data than it is to safely and sensibly store and manage it over a full lifecycle - this is a big problem Legacy Dag
  • 41. @chris_dag - https://bioteam.net Storage Landscape: Today [1] Major fundamental changes ● The capacity|performance calculus has swung the other way ● We now need very fast storage to handle machine learning, AI and image-based workflow requirements ● ML training & validation requires persistent access to the “old” data so we still need massive storage capacity ● Dominant file type at the moment is image-based, no longer genomes Current Dag
  • 42. @chris_dag - https://bioteam.net Storage Landscape: Today [2] Contributing Factors ● Lots of deployed storage is nearing EOL or end of support contract ● Some really interesting next-gen storage companies have launched ● Parallel storage is a lot more attractive w/ performance as key driver ● Operational benefits of scale-out NAS slightly less valuable in context Current Dag
  • 43. @chris_dag - https://bioteam.net Storage Landscape: Data As Currency ● Your organization has a big problem when the default stance among leadership or users is “all our data is important” - way worse than “we don’t know how to figure out what is important …” ● Not understanding the true value of data leads to hoarding, massive inefficiencies and inability to properly leverage the data at hand ● Data management, scientifically-relevant metadata, tracking the use, derivative uses, and amount of repeated uses of data could totally change how we approach scientific data storage ● It's about the data, not the storage platform Current Dag
  • 44. @chris_dag - https://bioteam.net Storage Tiering & Namespaces: Prior Talks ● Single namespace storage is really important ● If we don’t give users a single view of storage we end up with: ○ Multiple islands of data ○ Scientists store the same data in N different locations ○ Nasty data location and data provenance issues ● Seamless tiering within the namespace is desirable if possible Legacy Dag
  • 45. @chris_dag - https://bioteam.net Storage Tiering & Namespaces: Today [1] ● We've done a bad job at encouraging active data handling ● Data is currency; IT training focuses on "spend wisely" not "manage effectively" ● This is a multi-partner, multi-platform, multi-cloud world ● Global data protection methods hedge against disaster but not personal/group/lab/publication needs ● Still inappropriate for IT to make data classification decisions Current Dag
  • 46. @chris_dag - https://bioteam.net Storage Tiering & Namespaces: Today [2] Current Dag My biggest attitude change: IT attempts to make seamless namespaces and automatic tiering generally have failed to meet expectations; also hard to do efficiently or without researcher input anyway We need to place data management responsibilities back onto the end-users* Users whining about having to move/manage data when their career and publications are based on “data intensive science” will no longer be coddled. SCIENTIFIC DATA IS YOUR JOB.
  • 47. @chris_dag - https://bioteam.net Storage Tiering & Namespaces: Today [4] * Some Exceptions: ● There ARE data management tasks that are a waste of time and skill for highly trained scientists ● Biggest example: large-scale physical data ingest and export - scientists should not be dealing with portable hard drives beyond a certain scale ● Large-scale physical data movements needs written SOPs and a process owned by IT Current Dag
  • 48. @chris_dag - https://bioteam.net Storage Tiering & Namespaces: Today [3] New social contract between IT and Users IT Provides: ● Storage that meets business and scientific requirements ○ Including scratch, active, nearline, archive and object ○ Durable, available and reliable ● Metrics, monitoring, reporting and tools that enable user self-service End User Responsibilities: ● Users responsible for scientific data management through full lifecycle ○ Including classifying, curating, organizing and moving it Current Dag
  • 49. @chris_dag - https://bioteam.net Storage: What this all means ● The new requirements for speed + capacity is deeply scary ● Image workloads and ML/AI mean we can’t trade away performance in exchange for larger capacity any more ● Enterprise IT has more justification to transition platforms: ○ Conservative shops can buy the faster flash-powered levels of Scale-out NAS ○ Conservative shops can go IBM Spectrum Scale (managed GPFs) ○ Forward-looking shops will bring in new platforms and vendors ○ BeeGFS, Ceph & Lustre will find new audiences ● I’m cool with tiers, namespaces and making end-users more responsible Current Dag
  • 50. @chris_dag - https://bioteam.net Storage: Interesting Players Metadata, Discovery, Data Protection ● Starfish Storage, https://starfishstorage.com/ ● Atavium, https://www.atavium.com/ ● Arcitecta, https://www.arcitecta.com/ ● Igneous, https://www.igneous.io/ Next-Gen / Flash Storage Architectures ● VAST Data, https://www.vastdata.com/ ● WekaIO, https://www.weka.io/ ● Pure Storage, https://www.purestorage.com/ Current Dag
  • 51. @chris_dag - https://bioteam.net Storage: Interesting Players, Continued Data Movement ● Globus, https://www.globus.org/ ● DataDobi, https://datadobi.com/ ● Zettar, https://www.zettar.com/ Current Dag
  • 53. @chris_dag - https://bioteam.net Networking: Still the #1 hassle but little change since 2018 Still the #1 IT infrastructure problem in data intensive life science ● Still have trouble moving scientific data at scale across networks ● We still lag in deploying 40-gig and 100-gig networking ● Enterprise IT still focusing on datacenter rather than edge & lab ● We still need to separate business network traffic from science data traffic using Science DMZ design patterns ● Our connections to the Internet and Cloud are still too small ● Our firewalls and security controls are still designed for business traffic and not monster “elephant” flows ● Biggest new thing was Nvidia purchasing Mellanox ! Past & current!
  • 55. @chris_dag - https://bioteam.net Cloud: Meta issues still the same but some changes ... Past & current! Consistent message for 10 years now ● Cloud is a capability play for life science research organizations ● Saving money is not the primary driver*
  • 56. @chris_dag - https://bioteam.net Cloud: Meta issues still the same but some changes ... * About that “not a cost saving thing” message … ● Serverless Computing is transformational for capability ● Serverless Computing is transformational for cost Read this: https://rise.cs.berkeley.edu/blog/a-berkeley-view-on-serverless-computing/ Search engine shortcut: “berkeley view on serverless 2019” ● Primary caveat is that discovery oriented science still relies heavily on interactive human efforts with bespoke tooling. A large chunk of our Bio-IT landscape cannot be codified into APIs & service mesh architectures
  • 57. @chris_dag - https://bioteam.net Cloud: Meta issues still the same but some changes ... ● Microsoft acquisition of Cycle Computing is really starting to become apparent on Azure Cloud - lots of interesting HPC and storage offerings ● Cloud efforts to build bespoke accelerated hardware for AI/ML and inference is of some concern. What used to be a simple cost or capability eval now will require deep IT interaction with end-users to learn their preferences and needs for SDKs, frameworks and tooling ● Scarcity of GPU resources on AWS has been a consistent trend across multiple projects. We can’t get them at all, let alone within a placement group!
  • 59. @chris_dag - https://bioteam.net Recap - Bottom Line 2019 Summary 1. Unit cost of storage vs. consumption rate will force hard choices and new governance 2. Data discovery, management, curation and movement are still major concerns 3. Storage selection pendulum has moved in a big way. We now have to be BIG and FAST. This will have a major impact 4. Responsibility for scientific data management must rely with end-user and not IT 5. Compilers, toolchains and silicon matter again; it’s time to resurrect the benchmark and eval crew 6. Science users can now swamp systems of any scale with valid research questions; expect governance and service scope constraints to become more prevalent 7. Colo Facilities are being used more often 8. Life Science stands apart in the HPC and supercomputing worlds for the sheer size and diversity of our domains and workloads
  • 60. @chris_dag - https://bioteam.net Crowdsourcing thanks! Sincere thanks to the folk who responded online with comments and suggestions. Including: ● Philippe Neron ● Matthew Trunnell ● Glenn Lockwood ● Tim Cutts ● Nick Weber ● Tom Bolton ● Dirk Petersen ● Gregg TeHennepe ● Eduardo Zaborowski ● Remy Evard ● Tom Plasterer ● Joe Stanganelli ● Jason Tetrault 2020 is the 10-year BioIT World anniversary! The conference organizers are very interested in what you’d like to see and hear to make next year very special.
  • 61. End; Thanks!; Want these slides? slideshare.net/chrisdag or https://bioteam.net
  • 62. Portrait commissioned from the artist who did the illustrations for the “Heroines of JavaScript Trading Cards”. Want your own? https://twitter.com/mirlu_exe