Presented at GlobusWorld 2022 by a set of panelists moderated by Bob Flynn from Internet2. Panelists offer their perspectives on migrating between cloud storage providers.
Coping Strategies for the Death of Unlimited Storage
1. Coping Strategies
for the Death of Unlimited Storage
GlobusWorld 2022
Panelists
Sarah Bailey, UC Berkeley
Christopher Clements, San Diego State University
Jim Leous, The Pennsylvania State University
Charles McClary, Indiana University
Hellen Zziwa, Harvard University
Moderator
Bob Flynn, Internet2
2. Microsoft
• September 2013: 7 GB/user
• June 2014: 1TB/user; ??/enterprise
• October 2014: Unlimited
• November 2015: 1TB/user; ??/enterprise
• 2019: Up to 25 TB/user, upon request
• 2019: Many universities move to license
certain products for only “knowledge
workers”
Google Drive
• April 2012: 5 GB/user
• May 2013: 30 GB/user
• August 2014: Unlimited
• December 2019: Researching charges for
accounts and unlimited storage
• February 2021: End of unlimited storage.
Change to tiered pricing model
Box
• 2012: 50 GB/user; # users x 2 GB/enterprise
• 2013: 100 GB/user; # users x 4 GB/enterprise
• August 2015: Unlimited
• December 2019: Change to $820/TB/year
pricing model
• Spring 2020: Change to $130/TB/year
History of Cloud Storage Quotas/Licenses/Account Limits
Slide Credit: Ian Crew, UC Berkeley
9. Sarah Bailey Chris Clements Jim Leous
University of California
Berkeley
Harvard
University
San Diego State
University
Indiana
University
Charles McClary Hellen Zziwa
Your Panel
The Pennsylvania
State University
11. Background
● Available services - Box, Google Drive, and Sharepoint
● OneDrive is not currently available at UCB
● Sharepoint is our secure storage solution
● Priorities: effective change management and communication with the campus community and
transparent and balanced management of services in the portfolio.
12. What issues have emerged related to migrating data?
● Tools for migration are scoped too broadly (Globus)
● Would require secure certification for server running Globus, and high level of vendor trust
● Data throughput is too low
● Tools for monitoring service usage are inadequate for completing the requirements established by
service providers
14. User Services Portfolio
Enterprise Application Support
• Google Workspace
• O365/Azure
• ServiceNow
• Zoom
• Canvas
• Mediasite
• Adobe Acrobat Sign
• Adobe Creative Cloud
• Globus
• Duo Security
• Slack
Support Services
• Help Desk Services
• Desktop Services
• Identity Management Support (SDSUid)
• Duo Multi-Factor (MFA) Support
• Network and Wireless Troubleshooting
• Software Distribution
• ZoomCorps
• ServiceNow Corps
15. Cloud Storage Today
Google Workspace
• 76,000 Active Accounts
• 1 petabyte of data
• Primary repository for PL data
• Campus standard for communication and collaboration (faculty, staff, and students)
Microsoft OneDrive
• Used for PC backup and folder redirection
SharePoint
• Departmental use only
• The long-term strategy is to phase this out
Azure
• 356 terabytes of primary and backup storage
• Replaced traditional tape backup
Amazon S3
• Projects requiring GovCloud
16. Preparing For Limited Storage
What we are doing right
• Automation (provisioning and timely deprovisioning of accounts)
• Staff, volunteers, and consultant accounts are deleted 90 days after separation
• Faculty and students who are graduating have a 1-year grace period
• Accounts for life are not offered to our alumni
• Routine auditing of accounts
What we are working on
• Appropriate retention policies
• Consider how Google’s new storage tools can be applied
• Training up and expanding our enterprise platform administrators
• Increase user base training and documentation
17. What Does The Future Hold?
Future direction
• SDSU prefers the Google Workspace platform
• Our users prefer the file sharing interface of Google Workspace. Moving data to cold storage isn’t
a desirable option
• Continue to use Globus with our Google and Amazon connectors for sharing research data
• Moving forward, we will consider all storage options
What we need from our cloud partners
• Additional add-on storage that is competitively priced and easily acquired
• Tools for administrators and end-users to help manage their storage
• Better communication as it relates to changes and assistance with implementation strategies
e.g., training materials, sample communications…
19. HUIT
HUIT 019
0.4% users account for 81.6% of Google storage use
Harvard’s Google Use Cases
■ Medium term storage
■ Sharing externally
■ Archival storage
20. HUIT
HUIT 020
Destroy?
Google Drive
20
Share
Central Storage
Create /
Capture
Local Storage
Current state: Sample Workflow
Google Drive (Sync) used to Collect, Share and Archive large datasets/files (cryo-EM/video)
Collect
Process
Archive
21. HUIT
HUIT 021
An unmet storage need: Medium-term storage
21
Need for medium term storage
As a faculty member in the
History of Art and
Architecture..
I want to store and manage
a large image collection, with
various permissions, so that I
can use them for teaching,
research and publication.
I want to temporarily store a
large collection of audio/visual
files, so that I can review and
process them before
depositing into the DRS.
As a special collections
curator...
As a digital archivist…
I want to store large amounts
of digital content transferred
by donors, so that I can
securely appraise and
describe it before it is
archived.
22. HUIT
HUIT 022
22
Future state: Sample Workflow
Leverage Globus to expand options for data collection / sharing
Destroy?
Google Drive
Share
Central Storage
Create /
Capture
Local Storage
Collect
Process Archive
AWS S3
OneDrive
Tape Library
AWS S3 Glacier
23. HUIT
HUIT 023
■ How do we protect users against inadvertent retrieval
penalties from AWS S3 Glacier Deep Archive?
■ What happens to shared data/files when the owner
leaves the University?
■ How do we provide insights into the data to help with
data lifecycle management?
Open Questions
25. Globus@IU
Overview of IU on-prem Storage and data transfer
1. Storage
1. Redundant HPSS Tape Library archive storage service (total 354
PB)
2. Redundant GPFS storage service (total 7.2 PB)
3. Lustre storage service (total 11.6 PB)
2. Data transfer methods
1. HPSS – HSI, SFTP, Globus
2. GPFS – test native client (HPC), Samba, SFTP, Globus
3. Lustre – native client (HPC), Globus
26. Cloud storage @IU
In the beginning – Box.com
• Well adopted by users
• Significant price increase
• Major IU project with vended migration tool to move to Google & MS
• Prepped Globus Box connector to assist but…
• Vended migration tool with movement to Google and MS was the focus
• Serious concern for well-being of Tape Library service
• Drag-n-drop “many small files”
27. Cloud storage @IU (cont.)
Now - Google Drive & MS OneDrive/Sharepoint/Team
• Reasonably well adopted but split usage
• Google with price increase
• Major IU project to coordinate reduction of Google use and data migration
• Concerns for what data should go where (wrt…data classifications)
• Effort is in progress
• Prepped Globus connectors for Google Drive & MS to assist
• Serious concern for well-being of Tape Library service
• Premium connectors are set to “invite only”
• Plan to “white glove” research users (i.e. protect the tape library)
28. Tape Library and Globus
• It's WAY too easy to ingest "many small files"
• Retrieving "many small files" is a challenge
• Accessing each file has overhead and can beat up tape drives and robot pickers
• User perceived slow performance
Best Practice: encourage users to aggregate small files
Feature Request: Provide a way to auto-aggregate small files or block/throttle the upload of small files.
29. OACIOR
Office of the Associate CIO for
OACIOR
Office of the Associate CIO for
Cloud Storage Transitions
Jim Leous
Office of the Associate CIO for Research
Penn State
30. OACIOR
Office of the Associate CIO for
Box to O365 Transition
• "We're just moving people from one cloud storage to another, right?"
• We started with 143k and we're down to on the order of a couple of
hundred remaining to migrate.
• Office 365 hasn't been the answer to everything
• The "tough stuff" is left...
31. OACIOR
Office of the Associate CIO for
What else is there?
What remains in the "Clean-up List" -- a list of mostly researchers
where O365 is not the solution (currently ~250 accounts):
• AWS S3, Glacier
• Azure
• On-prem
• Wasabi
• Dropbox
• Google Drive
32. OACIOR
Office of the Associate CIO for
Workflows Matter!
• We've been telling people to move research data
into the cloud for 6-7 years
• "It's FREE!” we said…
• Researchers designed ways to get data from
instruments into Box
• Researchers designed ways to edit video in the cloud
• What we realized is that workflows matter!
33. OACIOR
Office of the Associate CIO for
Cloud Limitations Limit Science
Some current solutions, notably Office 365, have constraints or limits
that make them less than useful for Big Data studies:
• Limits on individual file size
• Limits on total "volume" of data
• Limits on pathname length