ILM Features of GPFS
Applications built to solve the world’s most challenging problems need high performance computing (HPC). We have found that GPFS’s information lifecycle management toolset helps clients achieve efficiencies required for HPC via policy-driven automation and tiered storage management.
GPFS Storage Management w/ Storage Pools
GPFS helps better align the cost of storage to the value of your data with storage pools, filesets and user-defined policies. Storage pools are used to manage groups of disks within a file system; with storage pools you can create tiers of storage by grouping disks based on performance, locality or reliability characteristics. When data is placed in or moved between internal storage pools all of the data management is done by GPFS.
In addition to internal storage pools GPFS supports external storage pools which are used to interact with external storage management applications. When moving data to an external pool GPFS handles all of the metadata processing then hands the data to the external application for storage on alternate media.
Filesets – a sub tree of the file system namespace – provide an administrative boundary that can be used to set quotas, take snapshots, define AFM relationships and be used in user defined policies to control initial data placement or data migration. Data within a single fileset can reside in one or more storage pools.
User Defined Policies for File Placement & Management
Where the file data resides and how it is managed once it is created is based on a set of rules in a user defined policy. There are two types of user defined policies in GPFS: file placement and file management. File placement policies determine in which storage pool file data is initially placed and are defined using attributes of a file known when a file is created such as file name, fileset or the user who is creating the file.
File Migration, Deletion, Status, Reporting
Once files exist in a file system, file management policies can be used for file migration, deletion, changing file replication status or generating reports.
File migration & replication: You can use a migration policy to transparently move data from one storage pool to another without changing the file’s location in the directory structure. Similarly you can use a policy to change the replication status of a file or set of files, allowing fine grained control over the space used for data availability. You can use migration and replication policies together, for example a policy that says: migrate all of the files located in the subdirectory /database/payroll which end in *.dat and are greater than 1 MB in size to storage pool #2 and un-replicate these files’.
File deletion: File deletion policies allow you to prune the file system, deleting files as defined by policy rules. Reporting on the contents of a file system can be done through list policies. List policies allow you to quickly scan the file system metadata and produce information listing selected attributes of candidate files.
File management: File management policies can be based on more attributes of a file than placement policies because once a file exists there is more known about the file. For example file placement attributes can utilize attributes such as last access time, size of the file or a mix of user and file size. Rule processing can be further automated by including attributes related to a storage pool instead of a file using the threshold option..
Thresholds Allow Optimized Storage Performance
Thresholds allow you to fully utilize your highest performance storage and automate the task of making room for new high priority content. The threshold option comes with the ability to set high, low and pre-migrate thresholds. Pre-migrated files are files that exist on disk and are migrated to tape. This method is typically used to allow disk access to the data while allowing disk space to be freed up quickly when a maximum space threshold is reached. This means that GPFS begins migrating data at the high threshold, until the low threshold is reached. If a pre-migrate threshold is set GPFS begins copying data until the pre-migrate threshold is reached. This allows the data to continue to be accessed in the original pool until it is quickly deleted to free up space the next time the high threshold is reached.
Metadata Processing
Executing file management operations requires the ability to efficiently process the file metadata. GPFS includes a high performance metadata scan interface that allows you to efficiently process the metadata for billions of files. This makes the GPFS ILM toolset a very scalable tool for automating file management. This high performance metadata scan engine employs a scale-out approach. The identification of candidate files and data movement operations can be performed concurrently by one or more nodes in the cluster.
GPFS can spread rule evaluation and data movement responsibilities over multiple nodes in the cluster providing a very scalable, high performance rule processing engine. Learn more about GPFS on our website or contact our GPFS experts to learn more about how it can work for you.