Because file systems are designed to appear as a single disk for a single computer to manage, many new challenges arise in a grid scenario whereby any single disk within the grid should be capable of handling requests for any data contained in the grid.
Features
Most file storage utilizes layers of redundancy to achieve a high level of data protection. Current means of redundancy include replication and parity checks. Such redundancy can be implemented via a RAID array. Similarly, a grid file system would consist of some level of redundancy across the various disks present in the "grid".
Framework
First and foremost, a file table mechanism is necessary. Additionally, the file table must include a mechanism for locating the file within the grid. Secondly, a mechanism for working with file data must exist. This mechanism is responsible for making file data available to requests.
Implementation
With BitTorrent technology, a parallel can be drawn to a grid file system, in that a torrent tracker would be the "file table", and the torrent applications would be the "file data" component. An RSS feed like mechanism could be utilized by file table nodes to indicate when new files are added to the table, to instigate replication and other similar components. A file system may incorporate similar technology. If both such systems were capable of being addressed as a single entity, then growth into such a system could be easily controlled simply by deciding which uses the grid member would be responsible.
The largest problem currently revolves around distributing data updates. Torrents support minimal hierarchy. Updating multiple nodes concurrently presents latency during updates and additions, usually to the point of not being feasible. Additionally, a grid file system breaks traditional TCP/IP paradigms in that a file system require complicated TCP/IP implementations, introducing layers of abstraction and complication to the process of creating such a grid file system.
Examples
Examples of high-available data include:
Network load balancing / CARP – splitting incoming requests to multiple computers, usually configured identically or as one whole.
Shared storage clustering / SANs – a single disk is presented to multiple computers which split incoming requests. This is usually used when more computing power is required than disk access.
Data replication / mirroring – multiple computers may attempt to synchronize data. Used more often for either Reporting or backup purposes.
Data partitioning – splitting data among multiple computers. In databases, data is often partitioned based on tables... general files tend to be partitioned either by category, or location.