GlusterFS is an open source, distributed file system capable of scaling to several petabytes and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect. 

GlusterFS can also be used as a replacement for HDFS and to run Map/Reduce jobs on data residing on it. GlusterFS Hadoop plugin allows exisitng Map/Reduce jobs to seamlessly work without any changes. This is done by using Hadoop's FileSystem interface and communicating to GlusterFS via it's native protocol (using FUSE).

Currently the plugin works with Hadoop version 0.20.0 *only*. Work needs to be done to make it work with recent versions of Hadoop.

The working version of the plugin (that works with 0.20.2) is hosted here:


(under glusterfs-hadoop)


Looking out for people who can work on this and contribute code back to GlusterFS.

Project Members