Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.
Tachyon Cluster Configuration Setup Manual
In Master Node
- tar -xvzf tachyon-0.7.1-bin.tar.gz
- cd tachyon-0.7.1
- cp tachyon-env.sh.template tachyon-env.sh
- vi conf/workers - Add all worker nodes ip here (localhost will be default)
In Slaves
- tar -xvzf tachyon-0.7.1-bin.tar.gz
- cd tachyon-0.7.1
- ./bin/tachyon bootstrap-conf <tachyon_master_hostname>
(This script needs
to be run on each node you wish to configure.It will configure your
workers to use 2/3 of the total memory on each worker.)
In Master Node
- sudo ./bin/tachyon format
- sudo ./bin/tachyon-start.sh all Mount
- Now go to http://masterIP:19999/home
HDFS as
underFS (Tachyon can run
with different underlayer storage systems)
By default, Tachyon is set to use HDFS version 1.0.4. You can use
another Hadoop version by changing the hadoop.version tag in pom.xml
in Tachyon and recompiling it. You can also set the hadoop version
when compiling with maven:
- $ mvn -Dhadoop.version=2.2.0 clean package
After completing this,
- Edit tachyon-env.sh file. And set TACHYON_UNDERFS_ADDRESS
TACHYON_UNDERFS_ADDRESS=hdfs://HDFS_HOSTNAME:HDFS_PORT.
Thats all
=======================================================================
Possible Errors :
For more : http://tachyon-project.org/documentation/v0.7.1/Running-Tachyon-on-a-Cluster.html