A Parallel File System Defined: Understanding Lustre, Lustre over ZFS and Spectrum Scale

A parallel file system enables systems simultaneous, coordinated access to data across multiple storage servers over a high-performance network such as OmniPath or Infiniband.

High-performance computing (HPC) parallel applications require parallel file systems to take advantage of multiple IO paths and distributed storage devices; large numbers of clients (even thousands) can take advantage of many processors to decrease time to results for computing complicated data sets.

Common applications include genome sequencing, climate modeling, seismic processing, machine learning and artificial intelligence, financial modeling and video editing.  Parallel file systems are often found in universities, government agencies, national laboratories, as well as industries such as financial services, life sciences, manufacturing, media and entertainment, and oil and gas exploration.

A parallel file system is a type of distributed file system; both support a global namespace, high bandwidth connections, and spread data across multiple storage servers scaling to petabytes in size.  There are differences between a distributed file system and a parallel file system as noted below.

Some distinctions include:

  • A distributed file system allows clients to access a given portion of the global namespace through a single storage node, even if other parts of a file are stored on other storage nodes. A parallel file system allows clients direct access to all storage nodes for data transfer instead of having to access everything through a single storage node.
  • Distributed file systems typically use a standard network protocol such as NFS or SMB to access the data from a storage node. Parallel file system normally require client nodes to install a client-based software driver to access shared storage over a high-speed interconnect, such as Infiniband or OmniPath.
  • A distributed file system will store a file on a single storage node and a parallel file system will stripe the file in chunks or segments across multiple storage nodes. Reads and writes of the file are done in parallel across the servers, thus increasing performance.
  • Distributed file systems are commonly used for active archives and data-heavy applications, whereas parallel file systems excel with high-performance workloads which take advantage the parallelism and extra bandwidth.
  • Distributed file systems frequently use erasure encoding or three-way replication for data redundancy, whereas parallel file systems generally use shared storage for fault tolerance.

RAID Inc. Optimized Parallel File Systems

Parallel file systems are used in many different industries and applications to provide fast access to storage systems, from small scale-up to enterprise-class systems with thousands of hosts.

Parallel file solutions by RAID Inc. are tailored to meet your needs. We build, configure and tune the system, delivering it on time and helping to install it.

Brief descriptions of some file systems we specialize in can be found below.

Lustre Parallel File System

Lustre is considered by many to be the best file system for storage on the market and is used by many of the Top 500 supercomputers in the world. The Lustre file system is available for Linux and features a POSIX-compliant UNIX® file system interface. Whether you have an existing Lustre file system that is underperforming or are planning something new, we can help. We have end-to-end expertise, from initial planning to implementation and continued optimization as your data-intensive environment rapidly changes.

Lustre over ZFS

Our Lustre over ZFS is a robust, scalable file system solution that leverages Lustre file system software (a free open source software) and ZFS on Linux. This combination helps us unleash the performance and scalability of the Lustre parallel file system for HPC workloads and the features of ZFS with higher density and lower TCO.

Spectrum Scale Parallel File System

Developed by IBM and introduced in 1998, Spectrum Scale™ (formally GPFS) is a high-performance parallel file system that can be deployed in “shared-nothing” or “shared-disk” distributed parallel modes.  Like Lustre, Spectrum Scale is used globally by numerous large enterprises and in several supercomputers listed on the Top 500 List. Whatever your needs, we will ensure that your Spectrum Scale file system solution is delivering maximum return on your technical investment.

Start a Conversation

Working with RAID Incorporated is very simple and straightforward. This is because we have two and a half decades of experience in the field of high-performance file system solutions. Contact one of our experts today to learn more about the above-mentioned data storage solutions or to just ask questions.