Thursday, December 17, 2015

Tachyon Cluster Configuration Setup Manual


Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.

Tachyon Cluster Configuration Setup Manual

In Master Node

In Slaves


(This script needs to be run on each node you wish to configure.It will configure your workers to use 2/3 of the total memory on each worker.)

In Master Node


HDFS as underFS (Tachyon can run with different underlayer storage systems)

By default, Tachyon is set to use HDFS version 1.0.4. You can use another Hadoop version by changing the hadoop.version tag in pom.xml in Tachyon and recompiling it. You can also set the hadoop version when compiling with maven:

  • $ mvn -Dhadoop.version=2.2.0 clean package

After completing this,

  • Edit tachyon-env.sh file. And set TACHYON_UNDERFS_ADDRESS

TACHYON_UNDERFS_ADDRESS=hdfs://HDFS_HOSTNAME:HDFS_PORT.

Thats all
=======================================================================

Possible Errors :



For more : http://tachyon-project.org/documentation/v0.7.1/Running-Tachyon-on-a-Cluster.html

Tachyon was not formatted!

When you are using HDFS as your underFSAddress, you may faced this error.

By inspecting master.log you can find out the error log as follows.

devan@Dev-ThinkPad-X230:~/tachyon-0.7.1$ tailf logs/master.log 

2015-12-18 10:14:34,571 ERROR MASTER_LOGGER (TachyonMaster.java:main) - Uncaught exception terminating Master
java.lang.IllegalStateException: Tachyon was not formatted! The journal folder is $TACHYON_HOME/journal/
at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
at tachyon.master.TachyonMaster.<init>(TachyonMaster.java:151)
at tachyon.master.TachyonMaster.main(TachyonMaster.java:63)

Solution

You need to delete HDFS temporary files created by tachyon in your underFS HDFS server.

hadoop fs -rm -r /tmp/tachyon/

Note : Tachyon will create a directory structure as follows in underFS HDFS server,

/tmp/tachyon/data/1
/tmp/tachyon/data/2
....

Wednesday, December 16, 2015

permission denied for root@localhost for ssh connection

Reason : 

SSH server denies password-based login for root by default. 

Solution:

In /etc/ssh/sshd_config, change:
PermitRootLogin without-password
to
PermitRootLogin yes
And restart SSH:
sudo service ssh restart

Monday, December 7, 2015

Column renaming after DataFrame.groupBy and agg


In the following code, the column name is "SUM(_1#179)", is there a way to rename it to a more friendly name?

scala> val d = sqlContext.createDataFrame(Seq((1, 2), (1, 3), (2, 10)))

scala> d.groupBy("_1").sum().printSchema
root
 |-- _1: integer (nullable = false)
 |-- SUM(_1#179): long (nullable = true)
 |-- SUM(_2#180): long (nullable = true)


http://apache-spark-user-list.1001560.n3.nabble.com/Column-renaming-after-DataFrame-groupBy-td22586.html




The simple way to achieve this is using   toDF() function.   

scala> val d = sqlContext.createDataFrame(Seq((1, 2), (1, 3), (2, 10)))
scala> d.groupBy("_1").sum().toDF("a","b","c").printSchema


root
 |-- a: integer (nullable = false)
 |-- b: long (nullable = true)
 |-- c: long (nullable = true)

Thursday, December 3, 2015

IntelliJ IDEA : Error:scalac: Output path is shared between:Module .... , Output path is shared between: Module .... Please configure separate output paths to proceed with the compilation.

You may challenged by this error during your work with IntelliJ IDEA

Output path  is shared between:Module .... ,  Output path  is shared between: Module .... Please configure separate output paths to proceed with the compilation.
TIP: you can use Project Artifacts to combine compiled classes if needed.



Cause 

This is because of existing multiple modules in the same IntelliJ IDEA project.

Solution

Remove other modules and keep single module file in IntelliJ IDEA project




Delete all other iml files without your project name (Project may be refactored to another one. Here Two modules available, SparkCDH and sparkcdh. Do not delete iml file with "-build" ) The result will be like follows.




Now you can run the project without multiple module shared error.





Wednesday, November 25, 2015

Hive Auto increment UDF in Apache Spark

All the functions we are expecting may not be supported or provided by Hive by default.
For example auto increment (row number) with select query.
Hive supporting User Defined Functions, using this feature user can create custom UDF for their use.

In this blog i am going to describe about how we can write one Auto Increment UDF in Hive and Apache Spark Hive. (Query should be run with Single mapper otherwise it will not work properly)

Code Snippet for Hive : 

@UDFType(deterministic = false, stateful = true)
public class AutoIncrUdf extends UDF{
      int lastValue;
    public int evaluate() {
     lastValue++;
        return lastValue;
   }
}  
After registering this class as UDF function* you can use the function name as showing below
hive> SELECT incr(),* FROM t1;
*For more about how to create the jar and use in hive please refer : http://hadooptutorial.info/writing-custom-udf-in-hive-auto-increment-column-hive/


Code Snippet for Spark Hive :

   var i = 0 //Define one variable for increment
    sqlContext.udf.register("incr", () =>{
      i= i+1
      i
    })
After this code (Register UDF function )you can use this function in hive query.

sqlContext.sql("select incr(),* from tableName")

Spark 1.4+ PermGenSize Error - IntelliJ IDEA

Error : 


java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread "main" 
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"


Solution : 

Give the following to the VM options.

Go to Run => Edit Configuration => Add the following line (I am giving 1GB, maxPermSize 512MB)

 -Xmx1024m -XX:MaxPermSize=512m -Xms512m





Thats it !!!

Wednesday, October 21, 2015

Apache Mesos single node cluster install on Ubuntu server using apt-get

Apache Mesos is a distributed scheduling framework which allows us to build a fault tolerant distributed system. It pools your infrastructure, automatically allocating resources and scheduling tasks based on demands and policy.

This blog post will describe about configuring/installing Mesos using simple apt get install.

1. Install requirements using the following commands





sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
CODENAME=$(lsb_release -cs)
echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | 
sudo tee /etc/apt/sources.list.d/mesosphere.list
sudo apt-get update
sudo apt-get install mesos

2. Run Mesos Master using the following command




sudo mesos-master --ip=GIVE_MASTER_IP --work_dir=/tmp/mesos

3. Run Mesos Slave using the following commands

sudo mesos-slave --master=GIVE_MASTER_IP:5050 --resources='cpus:4;mem:8192;disk:409600;'

You can check if the cluster started or not by pointing your web browser at MASTER_IP:5050. If you can see 1 at under “Slaves -> Activated” then you have single node cluster running.




Wednesday, October 7, 2015

ActiveMQ MQTT broker Websockets support and Paho JavaScript Client integration

Hi all, this blog post is about configuring ActiveMQ MQTT broker to enable Websockets and subscribe MQTT using Websockets from JavaScript Paho client. Follow these steps

1. Configure ActiveMQ MQTT for Websocket support

a). Add the following line to activemq.xml file in your Apache-Activemq conf directory

<transportConnector name="ws" uri="ws://0.0.0.0:1884?maximumConnections=1000&amp;wireFormat.maxFrameSize=104857600"/>
b). Restart ActiveMQ. This will open 1884 port in your MQTT broker.

2. Download mqttws31.js file 


3. Create one new file "config.js"

host = '127.0.1.1'; // hostname or IP address
port = 1884;
topic = 'mqtt/devTest';  // topic to subscribe to
useTLS = false;
username = null;
password = null;
// username = "devan";
// password = "devan123";
cleansession = true;


4. Download jquery.min.js 

5. Create one html file "mqttcheck.html"


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>MQTT Websockets</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="mqttws31.js" type="text/javascript"></script>
    <script src="jquery.min.js" type="text/javascript"></script>
    <script src="config.js" type="text/javascript"></script>

    <script type="text/javascript">
    var mqtt;
    var reconnectTimeout = 2000;

    function MQTTconnect() {
        mqtt = new Paho.MQTT.Client(
                        host,
                        port,
                        "web_" + parseInt(Math.random() * 100,
                        10));
        var options = {
            timeout: 3,
            useSSL: useTLS,
            cleanSession: cleansession,
            onSuccess: onConnect,
            onFailure: function (message) {
                $('#status').val("Connection failed: " + message.errorMessage + "Retrying");
                setTimeout(MQTTconnect, reconnectTimeout);
            }
        };

        mqtt.onConnectionLost = onConnectionLost;
        mqtt.onMessageArrived = onMessageArrived;

        if (username != null) {
            options.userName = username;
            options.password = password;
        }
        console.log("Host="+ host + ", port=" + port + " TLS = " + useTLS + " username=" + username + " password=" + password);
        mqtt.connect(options);
    }

    function onConnect() {
        $('#status').val('Connected to ' + host + ':' + port);
        // Connection succeeded; subscribe to our topic
        mqtt.subscribe(topic, {qos: 0});
        $('#topic').val(topic);
    }

    function onConnectionLost(response) {
        setTimeout(MQTTconnect, reconnectTimeout);
        $('#status').val("connection lost: " + responseObject.errorMessage + ". Reconnecting");

    };

    function onMessageArrived(message) {

        var topic = message.destinationName;
        var payload = message.payloadString;
document.getElementById("ws").innerHTML = 'Topic =' + topic + '<br> Payload = ' + payload ;
     
    };
    $(document).ready(function() {
        MQTTconnect();
    });

    </script>
  </head>
  <body>
<center>
    <h1>MQTT Websockets</h1>
    <div>
        <div>Subscribed to <input type='text' id='topic' disabled />
        Status: <input type='text' id='status'  disabled /></div>
<h2>
        <div id='ws'> </div>
 </h2>
   </div>
</center>
  </body>
</html>

6. Run mqttcheck.html file using your web browser you will see the result like following.



Download  source code : https://goo.gl/jK3TsV

Fancy Interface source code  : https://goo.gl/IsVs0A

Friday, July 31, 2015

Would you explain, in simple terms, exactly what object-oriented software is? Here is the answer by Steve Jobs


I wish i had a teacher like him to explain how OOPs concept works. :)
Here is the answer to the question by Steve Jobs in 1994 Rolling Stone interview

Objects are like people. They're living, breathing things that have knowledge inside them about how to do things and have memory inside them so they can remember things. And rather than interacting with them at a very low level, you interact with them at a very high level of abstraction, like we're doing right here.

Here's an example: If I'm your laundry object, you can give me your dirty clothes and send me a message that says, "Can you get my clothes laundered, please." I happen to know where the best laundry place in San Francisco is. And I speak English, and I have dollars in my pockets. So I go out and hail a taxicab and tell the driver to take me to this place in San Francisco. I go get your clothes laundered, I jump back in the cab, I get back here. I give you your clean clothes and say, "Here are your clean clothes."

You have no idea how I did that. You have no knowledge of the laundry place. Maybe you speak French, and you can't even hail a taxi. You can't pay for one, you don't have dollars in your pocket. Yet I knew how to do all of that. And you didn't have to know any of it. All that complexity was hidden inside of me, and we were able to interact at a very high level of abstraction. That's what objects are. They encapsulate complexity, and the interfaces to that complexity are high level.

Thursday, April 30, 2015

Building chromium from source code with depot_tools

     Install depot_tools

  1. Confirm git is installed. git 2.2.1+ recommended.
  2. Fetch depot_tools: (from home directory)
    $ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
  3. Add depot_tools to your ~/.bashrc file :
export GYP_GENERATORS=ninja
export PATH=$PATH:$HOME/depot_tools
export CHROME_DEVEL_SANDBOX=$HOME/chromium/src/out/Debug/chrome_sandbox
    • Yes, you want to put depot_tools ahead of everything else, otherwise gcl will refer to the GNU Common Lisp compiler.


    Get the Chromium Source Code

create a new directory chromium in home
mkdir ~/chromium
cd ~/chromium
git clone --depth 1 https://chromium.googlesource.com/chromium/src.git

The depth argument results in a shallow clone so that you don't pull down the massive history. You can remove it if you want full copy. It will take a lot of time (~30 minutes) to complete

cd ~/chromium/src

fetch —nohooks —no-history chromium —nosvn=True


This also take a lot of time to fetch.

Install any necessary dependencies


$ ./build/install-build-deps.sh

Run post-sync hooks
Finally, runhooks to run any post-sync scripts

$ gclient runhooks -force

ninja -C out/Debug chromium

Set Up the Sandbox

cd ~/chromium/src
ninja -C out/Debug chrome_sandbox
sudo chown root:root out/Debug/chrome_sandbox
sudo chmod 4755 out/Debug/chrome_sandbox

Run Chromium

cd ~/chromium/src
out/Debug/chrome


Run shell script out/Debug/chrome-wrapper
If sandbox error shows please run this command for disable sandbox and run. (Please note, withut sandbox, security issues may come)

./out/Debug/chrome-wrapper --no-sandbox %U



Here we go, browse using your own browser ... :) 





Friday, April 17, 2015

Turn off hibernate logging to console

Using hibernate with java is simply brilliant.
But the logs like this may make you think differently :)

Hibernate: select securityus0_.ID ....
Hibernate: select securityus0_.ID ....
Hibernate: select securityus0_.ID ....
Hibernate: select securityus0_.ID ....
So you may need to turn off these logs, that is simple.


 Change the property show_sql from true to false in your Hibernate config file.



Monday, February 23, 2015

Log4j with Maven

Log4j with Maven


1. Add dependency.

<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>

2. Add log4j.properties file inside "src/main/java" package (Because when we are building the project this file will reside in /target/classes, you can select any package which is a java build path source )



# Root logger option
log4j.rootLogger=DEBUG, stdout, file
# Redirect log messages to console
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
# Redirect log messages to a log file, support file rolling.
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=GIVE_PATH_FOR_LOGFILE
log4j.appender.file.MaxFileSize=5MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n


3. Main function


import org.apache.log4j.Logger;
import org.apache.log4j.PropertyConfigurator;


public class LogMain {
final static Logger logger = Logger.getLogger(LogMain.class);
public static void main(String[] args) {
logger.debug("test debug");
logger.error("test error");
}
}

4. Run !!

You will get this in “GIVE_PATH_FOR_LOGFILE”

2015-02-23 15:42:39 DEBUG LogMain:12 - test debug
2015-02-23 15:42:39 ERROR LogMain:13 - test error



Monday, January 5, 2015

Sqoop2 client 1.99.4

There are several changes added in 1.99.4 version.
So this may helpful to you.


import org.apache.sqoop.client.SqoopClient;
import org.apache.sqoop.model.MFromConfig;
import org.apache.sqoop.model.MJob;
import org.apache.sqoop.model.MLink;
import org.apache.sqoop.model.MLinkConfig;
import org.apache.sqoop.model.MSubmission;
import org.apache.sqoop.model.MToConfig;
import org.apache.sqoop.submission.counter.Counter;
import org.apache.sqoop.submission.counter.CounterGroup;
import org.apache.sqoop.submission.counter.Counters;
import org.apache.sqoop.validation.Status;

public class MysqlToHDFS {
public static void main(String[] args) {

String connectionString = "jdbc:mysql://YourMysqlIP:3306/test";
String username = "YourMysqUserName";
String password = "YourMysqlPassword";
String schemaName = "YourMysqlDB";
String tableName = "Persons";
String partitionColumn = "PersonID";
String outputDirectory = "/output/Persons";
String url = "http://YourSqoopIP:12000/sqoop/";
String hdfsURI = "hdfs://namenodeIP:8020/";
SqoopClient client = new SqoopClient(url);
long fromConnectorId = 2;
MLink fromLink = client.createLink(fromConnectorId);
fromLink.setName("JDBC connector1");
fromLink.setCreationUser("devan");
MLinkConfig fromLinkConfig = fromLink.getConnectorLinkConfig();
fromLinkConfig.getStringInput("linkConfig.connectionString").setValue(connectionString);
fromLinkConfig.getStringInput("linkConfig.jdbcDriver").setValue("com.mysql.jdbc.Driver");
fromLinkConfig.getStringInput("linkConfig.username").setValue(username);
fromLinkConfig.getStringInput("linkConfig.password").setValue(password);
Status fromStatus = client.saveLink(fromLink);
if (fromStatus.canProceed()) {
System.out.println("JDBC Link,ID : " + fromLink.getPersistenceId());
} else {
System.out.println("JDBC Link");
}
// create HDFS connector
long toConnectorId = 1;
MLink toLink = client.createLink(toConnectorId);
toLink.setName("HDFS connector");
toLink.setCreationUser("devan");
MLinkConfig toLinkConfig = toLink.getConnectorLinkConfig();
toLinkConfig.getStringInput("linkConfig.uri").setValue(hdfsURI);
Status toStatus = client.saveLink(toLink);
if (toStatus.canProceed()) {
System.out.println("HDFS Link,ID: " + toLink.getPersistenceId());
} else {
System.out.println("HDFS Link");
}
long fromLinkId = fromLink.getPersistenceId();
long toLinkId = toLink.getPersistenceId();
MJob job = client.createJob(fromLinkId, toLinkId);//create job with jdbc and hdfs links
job.setName("MySQL to HDFS job");
job.setCreationUser("devan");
MFromConfig fromJobConfig = job.getFromJobConfig();
fromJobConfig.getStringInput("fromJobConfig.schemaName").setValue(schemaName);
fromJobConfig.getStringInput("fromJobConfig.tableName").setValue(tableName);
fromJobConfig.getStringInput("fromJobConfig.partitionColumn").setValue(partitionColumn);
MToConfig toJobConfig = job.getToJobConfig();
toJobConfig.getStringInput("toJobConfig.outputDirectory").setValue(outputDirectory);
Status status = client.saveJob(job);
if (status.canProceed()) {
System.out.println("JOB,ID: " + job.getPersistenceId());
} else {
System.out.println("Job can't be created");
}
long jobId = job.getPersistenceId();
MSubmission submission = client.startJob(jobId);
System.out.println("JOB : " + submission.getStatus());
while (submission.getStatus().isRunning()
&& submission.getProgress() != -1) {
System.out.println("JOB: "
+ String.format("%.2f %%", submission.getProgress() * 100));
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
System.out.println("JOB... ...");
System.out.println("Hadoop :" + submission.getExternalId());
Counters counters = submission.getCounters();
if (counters != null) {
System.out.println(":");
for (CounterGroup group : counters) {
System.out.print("\t");
System.out.println(group.getName());
for (Counter counter : group) {
System.out.print("\t\t");
System.out.print(counter.getName());
System.out.print(": ");
System.out.println(counter.getValue());
}
}
}
if (submission.getExceptionInfo() != null) {
System.out.println("JOB : " + submission.getExceptionInfo());
}
System.out.println("sqoop job successfully submitted");
}
}

==========================================================
If you are creating one maven project, add the following as dependency.

 <dependency>
<groupId>org.apache.sqoop</groupId>
<artifactId>sqoop-client</artifactId>
<version>1.99.4</version>
</dependency>