10. Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 8.0
Scale out Backup and Recovery (SOBAR)
SOBAR
11. 11
GPFS Scale Out Backup and Recovery
• SOBAR Relies on two well integrated components
– TSM HSM capability to premigrate files
– GPFS capability to dump file system image
• TSM HSM integrates with GPFS policy engine
– Allows to premigrate files (backup) which have
changed in between „backup“ cycles
– Versioning is not possible
• GPFS file system image includes all file system
metadata (inodes, etc)
– File system image is backed up to TSM
• For recovery the file system image is re-applied to
a new GPFS file system
– All data appears migrated and can be recalled
• SOBAR provides disaster protection
– Backup data resides in TSM server
GPFS Cluster
TSM Servers
Tape
LAN
TSM HSM
SOBAR
12. 12
• Recovery of the GPFS file system includes all directories and files in stub
format
– Most recent files can be selectively recalled
• High backup scalability:
– Leverages GPFS policy engine for fast file identifcation
– Files are “backed“ up incrementally forever
• High restore performance
– Only file metadata is applied without transferring file data
– File data resides on the TSM Server and recall happens on demand
• Lifts the ACL/Extended Attribute limitation of the TSM Server.
– Because Complete inode information is part of the image file.
• Requires TSM HSM and Backup client to be licensed and installed
• No versioning possible
GPFS SOBAR Characteristics
13. 13
stub
Object ID
(DMAPI handle)
TSM Server
file
migrated
Object ID
(DMAPI handle)
filepremigrated
stub
Object ID (new
DMAPI handle)
after
Image restore
migstate=yes
file
file
premigrate
migrate
recall
HSM Client on GPFS cluster
TSM HSM file states
File HSM state
recall
resident file
14. 14
Scale Out Backup And Restore – Backup process
File Data
Directory Data &
Directory Tree Relation
Metadata
(Inode & ACL)
File System and Cluster
Configuration Data
Continously:
Premigrate all file data
to TSM Server using
TSM for Space Management
and GPFS policy engine
Backup Step 1:
Collect and backup
file system configuration
to TSM Server using
TSM Client
Backup Step 2:
Create file system
image files and
Backup to TSM
Server using TSM
Client
TSM Server
15. 15
Scale Out Backup And Restore – Recovery process
File Data
Directory Data &
Directory Tree Relation
Metadata
(Inode & ACL)
File System and Cluster
Configuration Data
Recovery Step 1:
Restore file system
configuration
And recreate files
system manually
TSM Server
Recovery Step 2:
Mount file system and
restore file system
image files from TSM
Server
Automatically recreate
file system metadata
and directory tree
Recovery Step 3:
Enable space management and start
production.
Recall file data (on demand & background
using GPFS policy engine)
21. 21
Comparison of GPFS backup methods
Characteristic Snapshot mmbackup SOBAR
RTO Recovery Time Objective Low High (reading tape) Medium (partial tape read –
on demand)
RPO Recovery Point Objective Low Medium - High Medium
Backup window Low High Medium
Versioning Yes (multiple Snapshots) Yes No (stubbed)
Disaster protected No Yes Yes
Complete restore Yes Maybe Yes
Backup to tape No Yes Yes
Integration with ILM No Yes Yes
22. Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 8.0
Clones
34. Course materials may not be reproduced in whole or in part without the prior written permission of IBM. 8.0
Notes on TSM & HSM & Extended Attributes
This 1hr session of Spectrum Scale & ESS features, provide awareness of solutions and competitive advantages and how clients use them to improve their data / object management, with some discussion on the limitations and best practices, as well as a basic understanding on Extended Metadata, backup integration (their requirements), SOBAR, Quotas, Snapshots and Clones from a Sales perspective and basic implementation perspective.
There are 6 basic ways to backup data with GPFS.
1st and likely the best is Replication to a 2nd cluster. (almost instantly available on disaster).
2nd TSM Backup to tape or VTL.
3rd SOBAR (using TSM & HSM for tape optimized Disaster Recovery)
4th is Snapshots (point in time copies of files or file versioning)
5th is File Cloning
6th Is using client mounted exports to back up data to any common Backup system.
Let’s take a minute to discuss TSM Backups.
Notes:
GPFS can backup either a GPFS configuration or file data. But Standard File System Backups do not provide Disaster recoverability alone.
There are two things that remain essential considerations for backup.
GPFS Configuration & File Data
The configuration consists of all the Cluster Configuration information as well as the File System Configuration information.
The File Data consists of Data & Metadata.
Notes:
Before talking about backing up configuration data we need to look at a feature of GPFS called a “user exit” for backups.
A user exit is an event triggered script. The Script determines which event will trigger it’s execution.
Instructor notes:
Purpose — Describe user exits.
Details —
Additional information —
Transition statement — However you backup your data there are some files you want to be sure to back up.
Notes:
Backing up the GPFS file system configuration is extremely important. This can be done as the slide details.
If your intent is to backup data for individual file recovery that’s one thing, but if you want to restore a file system you will need the information that sets that up for you.
Instructor notes:
Purpose — Describe backing up the GPFS configuration.
Details —
Additional information —
Transition statement — Let’s remember the cluster configuration.
Notes:
There is a copy of the mmsdrfs file on every node in the cluster to cover some failure scenarios but it is always a good idea to back it up.
You can back it up in the file system or by using the user exit to make sure it is backed up when the file system changes.
Instructor notes:
Purpose — Describe backup and restoring of the cluster configuration.
Details —
Additional information —
Transition statement — Let’s look at GPFS and TSM.
Notes:
Use the mmbackup command to backup a GPFS file system to a backup server. mmbackup takes a temporary
snapshot named. mmbuSnapshot of the specified file system, and backs up this snapshot up to a back end
data store. Accordingly, the files backed up by the command will be stored in the directory
/Device/.snapshots/.mmbuSnapshot in the remote data store. This command may be issued from any GPFS
node in the cluster to which the file system being backed up belongs, and on which the file system is
mounted.
As part of a backup strategy you should backup your configuration files not stored in GPFS.
Instructor notes:
Purpose — Describe using GPFS with TSM.
Details —
Additional information —
Transition statement — Let’s review what we’ve learned to this point.
We often get the question of if we support Netbackup or some other 3rd party backup solution. And although we do not have plug in backup capabilities it is possible to design a solution to support 3rd party backup using the policy engine to build the work list for that backup system to use to back up our file system from a mounted file system on their media servers.
Scale Out Backup and Restore (SOBAR) is a specialized mechanism for data protection against disaster only for GPFS™ file systems that are managed by Tivoli® Storage Manager (TSM) Hierarchical Storage Management (HSM). So, you do need TSM & HSM licensing and Storage Pools for both TSM & HSM to use it.
For such systems, the opportunity exists to premigrate all file data into the HSM storage and take a snapshot of the file system structural metadata, and save a backup image of the file system structure. This metadata image backup, consisting of several image files, can be safely stored in the backup pool of the TSM server and later used to restore the file system in the event of a disaster.
The SOBAR utilities include the commands mmbackupconfig, mmrestoreconfig, mmimgbackup, and mmimgrestore. The mmbackupconfig command will record all the configuration information about the file system to be protected and the mmimgbackup command performs a backup of GPFS file system metadata. The resulting configuration data file and the metadata image files can then be copied to the TSM server for protection.
In the event of a disaster, the file system can be recovered by recreating the necessary NSD disks, restoring the file system configuration with the mmrestoreconfig command, and then restoring the image of the file system with the mmimgrestore command. NOTE: that the mmrestoreconfig command must be run prior to running the mmimgrestore command.
Extended Attributes of all files in GPFS will list the state of HSM migration. Resident, Pre-migrated, Migrated.
SOBAR will reduce the time needed for a complete restore by utilizing all available bandwidth and all available nodes in the GPFS cluster to process the image data in a highly parallel fashion. It will also permit users to access the file system before all file data has been restored, thereby minimizing the file system down time. Recall from HSM of needed file data is performed automatically when a file is first accessed.
The first step of the backup process is to collect and backup the file system configuration to TSM using the TSM client. The second step is to create the file system image and backup to the TSM server again using the TSM client and it continuously manages to pre-migrate all file data to the TSM server using HSM with the GPFS Policy engine (Incremental forever).
Then to restore, it begins with the File system configuration to recreate the file system manually, then it mount the file system and restores file images form TSM and automatically recreates the metadata and Directory trees, and finally we enable Space Management and start production, then clients recall file data on demand & in the background using the GPFS Policy engine.
One limitation to note is that these commands cannot be run from a Windows node in a GPFS cluster.
Just to clean up the process in review this chart walks thru the entire process of a SOBAR backup.
A snapshot of an entire GPFS™ file system can be created to preserve the contents of the file system at a single point in time.
Snapshots of the entire file system are also known as global snapshots. It is an instantaneous capture of the state of metadata that points to a set of data blocks.
The storage overhead for maintaining a snapshot is keeping a copy of data blocks that would otherwise be changed or deleted after the time of the snapshot.
Snapshots provide an online backup capability that allows easy recovery from common problems such as accidental deletion of a file, and comparison with older versions of a file.
However, because snapshots are not copies of the entire file system, they should not be used as protection against media failures.
Notes:
A snapshot is a logical, read-only copy of the file system or fileset (and all of its data) at a point in time.
A file system or Independent file set can be captured with a snapshot.
Dependent File sets can only be snapshot captured by their parent file system snapshot.
Transition statement — Let’s see how snapshots work.
Notes:
Various commands allow for the administration of snapshots:
Create a snapshot: mmcrsnapshot Device Directory [-j Fileset]
Viewing snapshot information: mmlssnapshot Device [-d [--block-size {BlockSize | auto}]] [-s {all | global | Snapshot[,Snapshot...]} | -j Fileset[,Fileset...]]
Delete a snapshot: -N Parameter allows for faster delete of snapshot
Usage: mmdelsnapshot Device Directory [-N {Node[,Node...] | NodeFile | NodeClass}]
Restore a file system from a snapshot—most of the time restore from snapshot is partial using file system commands to copy from snapshot directory to active area.
To restore the entire file system from a snapshot: mmrestorefs Device Directory [-c] However, this can obviously overwrite files that are intentionally changed changes.
Instructor notes:
Purpose — Describe snapshot administration.
Details —
Additional information —
Transition statement — One use for a snapshot is for running a point in time backup. Let’s look at accessing snapshot data.
Notes:
Snapshots are accessible through the “.snapshots” sub-directory in the file system root directory. This location can be changed.
One common failed assumption is that:
When snapshots are present, deleting files from the active file system does not always result in any space actually being freed up; rather, blocks may be pushed to the previous snapshot.
In order to to see true space reclaimed capacity from snapshot deletions, all snapshots must be deleted on the files.
Cloning a file is similar to creating a copy of a file, but the creation process is faster and more space efficient because no additional disk space is consumed until the clone or the original file is modified. Multiple clones of the same file can be created with no additional space overhead. You can also create clones of clones.
Read the chart
GPFS Clones are often used for VM’s, as well as Test, QA, and Development teams that all need to do their own thing with a same base state on files.
Especially when you want to save time and capacity, as only the changes are actually written.
Creating clones is a simple process however it is file based. File Systems and file sets cannot be cloned without a complex use of the policy engine to process a work list of files to clone.
To create a read-only snapshot of a file to be cloned issue the “mmclone snap file1 snap1”
To create a writeable clone copy clone parent issue the “mmclone copy snap1 file2”
The GPFS™ quota system helps you to control the allocation of files and data blocks in a file system.
Quotas are enabled by the system administrator when control over the amount of space used by the individual users, groups of users, or individual filesets is required.
GPFS quotas can be defined on
Individual users
Groups of users
Individual filesets
There are currently two approaches to quotas:
Enforced: These are traditional quotas where a soft and hard limit is set
Pros: Automatically implements hard limits,
Cons: Performance Overhead on each allocation and file create
Usage Reports: Use the sample utility (or your own tool) to report on usage on a periodic basis (commonly used for chargeback like purposes.
Pros: Short batch run can be done at off time, no performance impact on allocation or file create
Cons: No hard limits, enforcement is through a mechanism like “nag” emails
Transition statement — Usage report quotas are done in batch mode issued by cron, for example and you can customize the reports. Let’s take a look at how the “traditional” quota mechanism works.
Notes:
This is the process for setting up quotas
1. Set the –Q parameter on the file system
2. It is not required to set default quotas
3. Quotas can be set on users/groups/fileset and based on allocated bytes and/or number of inodes
4. Hard quotas will be implemented automatically, tools are provided to report on quota usage.
Transition statement — Let’s walk through each step of this process.
This is a file system parameter that can be set on creation or after the file system is created.
ProtoUser is a user that is used as a prototype for setting quotas on other users. For example, you can have an HR user prototype that sets quotas the same for all HR employees.
Transition statement — Now that quotas are enabled you can set default quotas
Notes:
By default, user and group quota limits are enforced across the entire file system. Optionally, the scope of quota enforcement can be limited to an individual fileset boundaries.
Transition statement — You can use default quotas, individual quotas or a mix.
mmedquota opens a “quota file” for that user/group or fileset and allows you to edit is using vi, for example.
Once the file is saved/closed the changes take effect.
Transition statement — Now that quotas are set you use mmrepquota to see the current quota usage.
In review, you can use two methods of implementing quotas:
Traditional quotas with hard stops
Or a reporting method using the high performance metadata interface
Remember that quacking quotas is a metadata intense operation and it is best practice to check quotes in off peak hours for a very busy system.
Transition statement — Let’s look at Extended Attributes
Another command for reporting Quota usage is mmlsquota.
Transition statement — Let’s look at Extended Attributes
The GPFS™ quota system helps you to control the allocation of files and data blocks in a file system.
Quotas are enabled by the system administrator when control over the amount of space used by the individual users, groups of users, or individual filesets is required.
When HSM is incorporated with TSM to provide Tape Based Archive and a file is migrated to tape, it is (by default) first backed up thru TSM and a Stub file is left on the GPFS file system. The Stub file contains the necessary metadata and a portion of the data to allow the file to be recalled from the Tape.
When a migrated file is accessed, it is recalled to the local file system to replace the Stub. The recall is automatic or selective depending on how the recall is initiated. It is wise to manage recalls wisely and avoid large scale recalls, because the can hang things up as they exhaust GPFS worker threads waiting for tapes to be loaded into drives to retrieve the files.
HSM will manage files as Resident (on Disk), Pre-Migrated (Disk & Tape), and Migrated (File on Tape with a Stub on disk).
Understanding Extended Attributes can be helpful in understanding things such as if the file is backed up, if snapshots have been applied, if data is migrated into HSM pools, what storage pool the data lives in and the state of replication for the data and the metadata. Learning to use the “mmlsattr” command can prove useful in validating assumptions. Reading attributes of files will not recall them from tape.
Note the Resident File report appears as ARCHIVE above. A resident file lives on the Disk.
Next slide..
A Migrated File shows as Archive Offline, with dmapi Object, ID, and Region. A migrated file lives on the Tape (with a stub file on disk), by default HSM requires the file is backed up before the file moves to Tape Archive (this can be over-ridden).
A PreMigrated File Shows with Archive with dmapi region, Mig, & ID.