Wednesday, September 18, 2013

Sqoop Java Client

Sqoop Java Client

Apache Sqoop

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project: More information
Latest stable release is 1.4.4 (downloaddocumentation). Latest cut of Sqoop2 is 1.99.2 (downloaddocumentation).

Here is the Java Client for Apache Sqoop import data from MySql to Hadoop hdfs :)

//Here I am using a table Persons, with columns PersonID and LastName
import org.apache.sqoop.client.SqoopClient;
import org.apache.sqoop.model.MConnection;
import org.apache.sqoop.model.MConnectionForms;
import org.apache.sqoop.model.MJob;
import org.apache.sqoop.model.MJobForms;
import org.apache.sqoop.model.MSubmission;
import org.apache.sqoop.validation.Status;

/**
 * @author  devan
 * @date 19-Sep-2013
 * @mail msdevanms@gmail.com
 */

public class SqoopImport {
 public static void main(String[] args) {
  
  
  String connectionString = "jdbc:mysql://YourMysqlIP:3306/test";
  String username = "YourMysqUserName";
  String password = "YourMysqlPassword";
  String schemaName = "YourMysqlDB";
  String tableName = "Persons";
  String columns = "PersonID,LastName"; //comma seperated column names
  String partitionColumn = "PersonID";
  String outputDirectory = "/output/Persons";
  String url = "http://YourSqoopIP:12000/sqoop/";

  
  SqoopClient client = new SqoopClient(url);
  //client.setServerUrl(newUrl);
  //Dummy connection object
  MConnection newCon = client.newConnection(1);

  //Get connection and framework forms. Set name for connection
  MConnectionForms conForms = newCon.getConnectorPart();
  MConnectionForms frameworkForms = newCon.getFrameworkPart();
  newCon.setName("MyConnection");

  //Set connection forms values
  conForms.getStringInput("connection.connectionString").setValue(connectionString);
  conForms.getStringInput("connection.jdbcDriver").setValue("com.mysql.jdbc.Driver");
  conForms.getStringInput("connection.username").setValue(username);
  conForms.getStringInput("connection.password").setValue(password);

  frameworkForms.getIntegerInput("security.maxConnections").setValue(0);

  Status status  = client.createConnection(newCon);
  if(status.canProceed()) {
   System.out.println("Created. New Connection ID : " +newCon.getPersistenceId());
  } else {
   System.out.println("Check for status and forms error ");
  }

  //Creating dummy job object
  MJob newjob = client.newJob(newCon.getPersistenceId(), org.apache.sqoop.model.MJob.Type.IMPORT);
  MJobForms connectorForm = newjob.getConnectorPart();
  MJobForms frameworkForm = newjob.getFrameworkPart();

  newjob.setName("ImportJob");
  //Database configuration
  connectorForm.getStringInput("table.schemaName").setValue(schemaName);
  //Input either table name or sql
  connectorForm.getStringInput("table.tableName").setValue(tableName);
  //connectorForm.getStringInput("table.sql").setValue("select id,name from table where ${CONDITIONS}");
  
  
  connectorForm.getStringInput("table.columns").setValue(columns);
  connectorForm.getStringInput("table.partitionColumn").setValue(partitionColumn);
  
  //Set boundary value only if required
  //connectorForm.getStringInput("table.boundaryQuery").setValue("");

  //Output configurations
  frameworkForm.getEnumInput("output.storageType").setValue("HDFS");
  frameworkForm.getEnumInput("output.outputFormat").setValue("TEXT_FILE");//Other option: SEQUENCE_FILE / TEXT_FILE
  frameworkForm.getStringInput("output.outputDirectory").setValue(outputDirectory);
  //Job resources
  frameworkForm.getIntegerInput("throttling.extractors").setValue(1);
  frameworkForm.getIntegerInput("throttling.loaders").setValue(1);

  status = client.createJob(newjob);
  if(status.canProceed()) {
   System.out.println("New Job ID: "+ newjob.getPersistenceId());
  } else {
   System.out.println("Check for status and forms error ");
  }
  //Now Submit the Job
  MSubmission submission = client.startSubmission(newjob.getPersistenceId());
  System.out.println("Status : " + submission.getStatus());
 
 }

 
}

19 comments:

  1. I tried to import data from MYSQL to HDFS through your program but program throws exception on the below line which looks irrelevant exception. Sqoop Server is running properly.

    MConnection newCon = client.newConnection(1);
    Exception: Exception in thread "main" java.lang.IllegalArgumentException: No enum const class org.apache.sqoop.model.MInputType.BOOLEAN
    at java.lang.Enum.valueOf(Enum.java:214)

    ReplyDelete
  2. Added my project to github. Here is the link https://github.com/msdevanms/2013.git please check it

    ReplyDelete
  3. Have you tried setting the table.sql input instead of pulling the entire set of rows in the table? I tried something like this and it always gave 'unacceptable' status on createJob.
    select id,value from table where ${conditions} and groupid=123

    ReplyDelete
  4. Good tutorial. Is there a way to add --enclosed-by and --escape-by properties(or any kind of delimiters) to the above given code sample.
    Thanks in advance

    ReplyDelete

  5. MConnection newCon = client.newConnection(1);
    Exception in thread "main" java.lang.NullPointerException
    at org.apache.sqoop.json.ConnectorBean.restore(ConnectorBean.java:111)
    at org.apache.sqoop.client.request.ConnectorRequest.read(ConnectorRequest.java:43)
    at org.apache.sqoop.client.request.SqoopRequests.readConnector(SqoopRequests.java:94)
    at org.apache.sqoop.client.SqoopClient.retrieveConnector(SqoopClient.java:188)
    at org.apache.sqoop.client.SqoopClient.getConnector(SqoopClient.java:140)
    at org.apache.sqoop.client.SqoopClient.newConnection(SqoopClient.java:273)

    ReplyDelete
    Replies
    1. Please help me in this?
      Thanks in advance

      Delete
    2. found any solution? i am also stuck at same error

      Delete
  6. Iam using url as "http://localhost:12000"
    SqoopClient client = new SqoopClient(url);
    MConnection newCon = client.newConnection(1);
    and iam getting this below exception while getting client.newConnection(1)
    Exception in thread "main" com.sun.jersey.api.client.UniformInterfaceException: GET http://localhost:12000/v1/connector/1 returned a response status of 404 Not Found
    at com.sun.jersey.api.client.WebResource.handle(WebResource.java:676)
    at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
    at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
    at org.apache.sqoop.client.request.Request.get(Request.java:63)
    at org.apache.sqoop.client.request.ConnectorRequest.read(ConnectorRequest.java:38)
    at org.apache.sqoop.client.request.SqoopRequests.readConnector(SqoopRequests.java:94)
    at org.apache.sqoop.client.SqoopClient.retrieveConnector(SqoopClient.java:188)
    at org.apache.sqoop.client.SqoopClient.getConnector(SqoopClient.java:140)
    at org.apache.sqoop.client.SqoopClient.newConnection(SqoopClient.java:273)
    at com.zoho.mysqlbackup.Sqoop2Client.createConnection(Sqoop2Client.java:24)
    at com.zoho.mysqlbackup.Sqoop2Client.main(Sqoop2Client.java:49)

    ReplyDelete
  7. Please try"http://localhost:12000/sqoop/" as url

    ReplyDelete
  8. Replies
    1. Iam getting same error even i tried with above given url.

      Exception in thread "main" com.sun.jersey.api.client.UniformInterfaceException: GET http://localhost:12000/sqoop/v1/connector/1 returned a response status of 404 Not Found
      at com.sun.jersey.api.client.WebResource.handle(WebResource.java:676)
      at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
      at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
      at org.apache.sqoop.client.request.Request.get(Request.java:63)
      at org.apache.sqoop.client.request.ConnectorRequest.read(ConnectorRequest.java:38)
      at org.apache.sqoop.client.request.SqoopRequests.readConnector(SqoopRequests.java:94)
      at org.apache.sqoop.client.SqoopClient.retrieveConnector(SqoopClient.java:188)
      at org.apache.sqoop.client.SqoopClient.getConnector(SqoopClient.java:140)
      at org.apache.sqoop.client.SqoopClient.newConnection(SqoopClient.java:273)
      at com.zoho.mysqlbackup.Sqoop2Client.createConnection(Sqoop2Client.java:24)
      at com.zoho.mysqlbackup.Sqoop2Client.main(Sqoop2Client.java:49)

      Delete
    2. try wget http://localhost:12000/sqoop and post the resutl please. I think your sqoop server is not in running mode.

      Delete
    3. which sqoop version you are using ??

      Delete
    4. result of wget http://localhost:12000/sqoop

      wget http://localhost:12000/sqoop
      --2015-01-05 10:40:26-- http://localhost:12000/sqoop
      Resolving localhost... ::1, 127.0.0.1
      Connecting to localhost|::1|:12000... connected.
      HTTP request sent, awaiting response... 302 Found
      Location: http://localhost:12000/sqoop/ [following]
      --2015-01-05 10:40:26-- http://localhost:12000/sqoop/
      Connecting to localhost|::1|:12000... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 889 [text/html]
      Saving to: `index.html'

      100%[======================================================================================================>] 889 --.-K/s in 0s

      2015-01-05 10:40:26 (272 MB/s) - `index.html' saved [889/889]

      Delete
    5. Iam using latest sqoop2 release 1.99.4

      Delete
  9. Hi

    I am getting null pointer at this line
    MConnection newCon = client.newConnection(1);

    Is there anything that I am missing

    ReplyDelete
  10. i am using this version Sqoop 1.4.6.2.6.0.3-8
    i am using Hearton Works (hdp cluster)
    we dont have sqoop ui
    how can i get url connection

    ReplyDelete
  11. any solution for this? I am also unable to connect to using sqoop client

    ReplyDelete