Wednesday, September 18, 2013

Sqoop Java Client

Sqoop Java Client

Apache Sqoop

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project: More information
Latest stable release is 1.4.4 (downloaddocumentation). Latest cut of Sqoop2 is 1.99.2 (downloaddocumentation).

Here is the Java Client for Apache Sqoop import data from MySql to Hadoop hdfs :)

//Here I am using a table Persons, with columns PersonID and LastName
import org.apache.sqoop.client.SqoopClient;
import org.apache.sqoop.model.MConnection;
import org.apache.sqoop.model.MConnectionForms;
import org.apache.sqoop.model.MJob;
import org.apache.sqoop.model.MJobForms;
import org.apache.sqoop.model.MSubmission;
import org.apache.sqoop.validation.Status;

/**
 * @author  devan
 * @date 19-Sep-2013
 * @mail msdevanms@gmail.com
 */

public class SqoopImport {
 public static void main(String[] args) {
  
  
  String connectionString = "jdbc:mysql://YourMysqlIP:3306/test";
  String username = "YourMysqUserName";
  String password = "YourMysqlPassword";
  String schemaName = "YourMysqlDB";
  String tableName = "Persons";
  String columns = "PersonID,LastName"; //comma seperated column names
  String partitionColumn = "PersonID";
  String outputDirectory = "/output/Persons";
  String url = "http://YourSqoopIP:12000/sqoop/";

  
  SqoopClient client = new SqoopClient(url);
  //client.setServerUrl(newUrl);
  //Dummy connection object
  MConnection newCon = client.newConnection(1);

  //Get connection and framework forms. Set name for connection
  MConnectionForms conForms = newCon.getConnectorPart();
  MConnectionForms frameworkForms = newCon.getFrameworkPart();
  newCon.setName("MyConnection");

  //Set connection forms values
  conForms.getStringInput("connection.connectionString").setValue(connectionString);
  conForms.getStringInput("connection.jdbcDriver").setValue("com.mysql.jdbc.Driver");
  conForms.getStringInput("connection.username").setValue(username);
  conForms.getStringInput("connection.password").setValue(password);

  frameworkForms.getIntegerInput("security.maxConnections").setValue(0);

  Status status  = client.createConnection(newCon);
  if(status.canProceed()) {
   System.out.println("Created. New Connection ID : " +newCon.getPersistenceId());
  } else {
   System.out.println("Check for status and forms error ");
  }

  //Creating dummy job object
  MJob newjob = client.newJob(newCon.getPersistenceId(), org.apache.sqoop.model.MJob.Type.IMPORT);
  MJobForms connectorForm = newjob.getConnectorPart();
  MJobForms frameworkForm = newjob.getFrameworkPart();

  newjob.setName("ImportJob");
  //Database configuration
  connectorForm.getStringInput("table.schemaName").setValue(schemaName);
  //Input either table name or sql
  connectorForm.getStringInput("table.tableName").setValue(tableName);
  //connectorForm.getStringInput("table.sql").setValue("select id,name from table where ${CONDITIONS}");
  
  
  connectorForm.getStringInput("table.columns").setValue(columns);
  connectorForm.getStringInput("table.partitionColumn").setValue(partitionColumn);
  
  //Set boundary value only if required
  //connectorForm.getStringInput("table.boundaryQuery").setValue("");

  //Output configurations
  frameworkForm.getEnumInput("output.storageType").setValue("HDFS");
  frameworkForm.getEnumInput("output.outputFormat").setValue("TEXT_FILE");//Other option: SEQUENCE_FILE / TEXT_FILE
  frameworkForm.getStringInput("output.outputDirectory").setValue(outputDirectory);
  //Job resources
  frameworkForm.getIntegerInput("throttling.extractors").setValue(1);
  frameworkForm.getIntegerInput("throttling.loaders").setValue(1);

  status = client.createJob(newjob);
  if(status.canProceed()) {
   System.out.println("New Job ID: "+ newjob.getPersistenceId());
  } else {
   System.out.println("Check for status and forms error ");
  }
  //Now Submit the Job
  MSubmission submission = client.startSubmission(newjob.getPersistenceId());
  System.out.println("Status : " + submission.getStatus());
 
 }

 
}

20 comments:

  1. I tried to import data from MYSQL to HDFS through your program but program throws exception on the below line which looks irrelevant exception. Sqoop Server is running properly.

    MConnection newCon = client.newConnection(1);
    Exception: Exception in thread "main" java.lang.IllegalArgumentException: No enum const class org.apache.sqoop.model.MInputType.BOOLEAN
    at java.lang.Enum.valueOf(Enum.java:214)

    ReplyDelete
  2. Added my project to github. Here is the link https://github.com/msdevanms/2013.git please check it

    ReplyDelete
  3. Have you tried setting the table.sql input instead of pulling the entire set of rows in the table? I tried something like this and it always gave 'unacceptable' status on createJob.
    select id,value from table where ${conditions} and groupid=123

    ReplyDelete
  4. Good tutorial. Is there a way to add --enclosed-by and --escape-by properties(or any kind of delimiters) to the above given code sample.
    Thanks in advance

    ReplyDelete

  5. MConnection newCon = client.newConnection(1);
    Exception in thread "main" java.lang.NullPointerException
    at org.apache.sqoop.json.ConnectorBean.restore(ConnectorBean.java:111)
    at org.apache.sqoop.client.request.ConnectorRequest.read(ConnectorRequest.java:43)
    at org.apache.sqoop.client.request.SqoopRequests.readConnector(SqoopRequests.java:94)
    at org.apache.sqoop.client.SqoopClient.retrieveConnector(SqoopClient.java:188)
    at org.apache.sqoop.client.SqoopClient.getConnector(SqoopClient.java:140)
    at org.apache.sqoop.client.SqoopClient.newConnection(SqoopClient.java:273)

    ReplyDelete
    Replies
    1. Please help me in this?
      Thanks in advance

      Delete
    2. found any solution? i am also stuck at same error

      Delete
  6. Iam using url as "http://localhost:12000"
    SqoopClient client = new SqoopClient(url);
    MConnection newCon = client.newConnection(1);
    and iam getting this below exception while getting client.newConnection(1)
    Exception in thread "main" com.sun.jersey.api.client.UniformInterfaceException: GET http://localhost:12000/v1/connector/1 returned a response status of 404 Not Found
    at com.sun.jersey.api.client.WebResource.handle(WebResource.java:676)
    at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
    at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
    at org.apache.sqoop.client.request.Request.get(Request.java:63)
    at org.apache.sqoop.client.request.ConnectorRequest.read(ConnectorRequest.java:38)
    at org.apache.sqoop.client.request.SqoopRequests.readConnector(SqoopRequests.java:94)
    at org.apache.sqoop.client.SqoopClient.retrieveConnector(SqoopClient.java:188)
    at org.apache.sqoop.client.SqoopClient.getConnector(SqoopClient.java:140)
    at org.apache.sqoop.client.SqoopClient.newConnection(SqoopClient.java:273)
    at com.zoho.mysqlbackup.Sqoop2Client.createConnection(Sqoop2Client.java:24)
    at com.zoho.mysqlbackup.Sqoop2Client.main(Sqoop2Client.java:49)

    ReplyDelete
  7. Please try"http://localhost:12000/sqoop/" as url

    ReplyDelete
  8. Replies
    1. Iam getting same error even i tried with above given url.

      Exception in thread "main" com.sun.jersey.api.client.UniformInterfaceException: GET http://localhost:12000/sqoop/v1/connector/1 returned a response status of 404 Not Found
      at com.sun.jersey.api.client.WebResource.handle(WebResource.java:676)
      at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
      at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
      at org.apache.sqoop.client.request.Request.get(Request.java:63)
      at org.apache.sqoop.client.request.ConnectorRequest.read(ConnectorRequest.java:38)
      at org.apache.sqoop.client.request.SqoopRequests.readConnector(SqoopRequests.java:94)
      at org.apache.sqoop.client.SqoopClient.retrieveConnector(SqoopClient.java:188)
      at org.apache.sqoop.client.SqoopClient.getConnector(SqoopClient.java:140)
      at org.apache.sqoop.client.SqoopClient.newConnection(SqoopClient.java:273)
      at com.zoho.mysqlbackup.Sqoop2Client.createConnection(Sqoop2Client.java:24)
      at com.zoho.mysqlbackup.Sqoop2Client.main(Sqoop2Client.java:49)

      Delete
    2. try wget http://localhost:12000/sqoop and post the resutl please. I think your sqoop server is not in running mode.

      Delete
    3. which sqoop version you are using ??

      Delete
    4. result of wget http://localhost:12000/sqoop

      wget http://localhost:12000/sqoop
      --2015-01-05 10:40:26-- http://localhost:12000/sqoop
      Resolving localhost... ::1, 127.0.0.1
      Connecting to localhost|::1|:12000... connected.
      HTTP request sent, awaiting response... 302 Found
      Location: http://localhost:12000/sqoop/ [following]
      --2015-01-05 10:40:26-- http://localhost:12000/sqoop/
      Connecting to localhost|::1|:12000... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 889 [text/html]
      Saving to: `index.html'

      100%[======================================================================================================>] 889 --.-K/s in 0s

      2015-01-05 10:40:26 (272 MB/s) - `index.html' saved [889/889]

      Delete
    5. Iam using latest sqoop2 release 1.99.4

      Delete
  9. Hi

    I am getting null pointer at this line
    MConnection newCon = client.newConnection(1);

    Is there anything that I am missing

    ReplyDelete
  10. i am using this version Sqoop 1.4.6.2.6.0.3-8
    i am using Hearton Works (hdp cluster)
    we dont have sqoop ui
    how can i get url connection

    ReplyDelete
  11. any solution for this? I am also unable to connect to using sqoop client

    ReplyDelete
  12. Your posts are always insightful and encouraging. Thank you for motivating me to take positive actions in my life. Find extra info in this article. Feel the rhythm in Geometry Dash Free, where every level’s obstacles and jumps are perfectly timed to dynamic, energetic music.

    ReplyDelete