Companies or businesses with more than one site often have the problem of not being able to work at the branch office as easily as they do at the head office. To address this problem, one of the most effective solutions is to use a distributed file system. Because it may be a little known topic for many of you, today we want to talk about what this distributed file system consists of, as well as the best solutions that we can find in the market.
What is a distributed file system?
As its name suggests, a distributed file system (DFS) is a file system that is distributed across different servers or locations. Programs can access or store isolated files as they do local files, allowing users to access files from any network or device.
The functionality of these systems is that users of physically distributed systems share their data and resources through the use of a common file system. A DFS runs as part of the operating system. In DFS, a namespace is created and this process is transparent to the clients. Here we can find two components.
- Location transparency: this transparency is achieved by means of the namespace component.
- Redundancy: this is achieved by a file replication component.
In the event of a failure, these components improve data availability by allowing data shared in different locations to be logically grouped together in a folder known as the ‘DFS root’.
Best-of-breed DFS systems
Today we can find different solutions when it comes to working with distributed file systems. Let’s take a look at some of the best options we can find.
Ceph
This is a storage solution that makes use of a proprietary file system known as Ceph File System. With Ceph, it is possible to store distributed data on different components of the same network. In addition, the data can also be stored on different physical storage media. Ceph guarantees a high flexibility in the choice of storage medium as well as a high scalability.
DRBD
DRBD (Distributed Replicated Block Device) is a Linux-based software that allows the replacement of shared storage systems. DRBD makes it possible to maintain data consistency between several systems on the same network, as well as ensuring high availability for Linux applications.
LeoFS
This is a distributed object storage platform that stands out for its high availability and consistency. It is the perfect solution for storing large amounts of data of different sizes. LeoFS requires a smaller server cluster compared to other storage platforms.
GlusterFS
GlusterFS is a distributed, scale-anywhere file system that brings together storage units from different servers into a single system. Because it works in a hidden way, once installed, nobody thinks about it again, but you have to act when data loss occurs or if the maximum available space is reached.
HDFS
It is a Java-based distributed system that acts as a single unit. It does this by creating an abstraction layer as if it were a single file system. HDFS allows data to be stored on multiple nodes while keeping their metadata intact. This increases processing speed, parallelism in operations and data replication. HDFS is a fundamental technology for Big Data, in other words, it is the Big Data File System or Big Data storage par excellence.
Lustre
Lustre is an Open Source distributed file system, typically used in large-scale clusters. The name is a mixture of Linux and clusters. This project aims to provide a file system for clusters of tens of thousands of nodes with petabytes of storage capacity, without affecting speed or security.
MooseFS
MooseFS is a fault-tolerant distributed file system. It distributes data to multiple physical servers and users can view these physical servers as a complete resource.
MogileFS
Is another distributed file system that is designed as a document server for high traffic websites. It is written in Perl and uses MySQl or PostgreSQL. The problem is that it does not provide any system for checking the integrity of the information stored in the grid, although it does allow the possibility of maintaining multiple copies of the same object.
OpenAFS
OpenAFS is an open source implementation of the Andew Distributed File System (AFS). AFS was originally developed at Carnegie-Mellon University as a commercial product by Transarc Corporation, which was subsequently acquired by IBM. At LinuxWorld on 15 August 2000, IBM announced plans to release a commercial version of the AFS product under the IBM Public License. This became OpenAFS. Today OpenAFS supports AIX, Mac OS X, Darwin, Digital UNIX, HP-UX, Irix, Solaris, Linux, Microsoft Windows, FreeBSD and OpenBSD.
TahoeLAFS
TahoeLAFS is a decentralised, open system that distributes a user’s documents to different Tahoe servers. In this way, information can be accessed transparently even if several of these servers are down. Only the owner of the documents, or those with the appropriate permissions, will be able to see the content.
XtreemFS
XtreemFS is an object-based distributed file system for wide area networks. It replicates objects to prevent failures, caching metadata and data for better performance.