Home / Sqoop Tutorial / Important Sqoop Commands

Important Sqoop Commands:

Validation

Validation is used for validating the data during import or export by comparing the number of row count on both source and destination. Some of the examples of this are,

A basic import of a table named EMPLOYEES in the test database that uses validation to validate the row counts:

$ sqoop import --connect jdbc:mysql://db.foo.com/test  \

    --table EMPLOYEES --validate

Sqoop-job

We can save the jobs in Sqoop if we want to repeatedly perform same import and export commands. This makes our task easier. A saved job records all the configuration information needed to execute sqoop command.

By default this job descriptions are saved in $HOME/.sqoop/. We can also use a shared metastore which makes all the jobs available to many users in a shared cluster.

$ sqoop job --create myjob -- import --connect jdbc:mysql://example.com/db \

    --table mytable

This creates a job named myjob which can be executed later. The job is not run. This job is now available in the list of saved jobs:

$ sqoop job --list

Available jobs:

  myjob

We can inspect the configuration of a job with the show action:

$ sqoop job --show myjob

 Job: myjob

 Tool: import

 Options:

 ----------------------------

 direct.import = false

 codegen.input.delimiters.record = 0

 hdfs.append.dir = false

 db.table = mytable

And if we are satisfied with it, we can run the job with exec:

$ sqoop job --exec myjob

10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation

The exec action allows you to override arguments of the saved job by supplying them after a –. For example, if the database were changed to require a username, we could specify the username and password with:

$ sqoop job --exec myjob -- --username someuser -P

Enter password:

Sqoop-merge

Merge tool is used for combining 2 datasets in which newer dataset values will overwrite older dataset. Normally we use this with the combination of incremental import. After we run the incremental import in last-modified state we get multiple datasets in HDFS then with the help of merge we get newer data in each dataset. It will flatten the 2 datasets into one and get the new records based on primary key.

Merge tool also runs MapReduce job which takes 2 directories as input mentioned by “—new-data” argument for new dataset and “—onto” argument for old dataset. Based on “—target-dir” argument output is stored in HDFS.

This merging happens based on a unique primary key in each record where the column for primary key is mentioned by “—merge-key”. Multiple rows in the same dataset should not have the same primary key otherwise data loss may occur.

We can use codegen tool to get the class for extracting the key column, also can mention the class name and jar file by using “—class-name” and “—jar-file” parameters.

Let us take an example for explaining merge tool. Here we are merging 2 incremental imports with “—merge-key” as “id” where older data is in an HDFS directory named older and newer data is in an HDFS directory named newer:

$ sqoop merge --new-data newer --onto older --target-dir merged \

    --jar-file datatypes.jar --class-name Foo --merge-key id

This will run a MapReduce job where the value of id column of each row is used to join rows; rows in the newer dataset will be used in preference to rows in the older dataset.

This can be used with both SequenceFile-, Avro- and text-based incremental imports. The file types of the newer and older datasets must be the same.

Sqoop- codegen

This codegen tool is used for generating java classes if we have a lost a java source or any problem occurs. By this we can create a new version of class with different delimiters between fields.

Recreate the record interpretation code for the employees table of a corporate database:

$ sqoop codegen --connect jdbc:mysql://db.example.com/corp \

    --table employees

Sqoop-eval

Eval tool in Sqoop is used to run any simple queries on the database and it prints that results on the console. It is used only for testing purpose so don’t use it in production environment. We can check whether our import data is correct or not using eval tool.

Select ten records from the employees table:

$ sqoop eval --connect jdbc:mysql://db.example.com/corp \

    --query "SELECT * FROM employees LIMIT 10"

Insert a row into the foo table:

$ sqoop eval --connect jdbc:mysql://db.example.com/corp \

    -e "INSERT INTO foo VALUES(42, 'bar')"

Sqoop- list databases

It is used to list database schemas present on a server. List tool only works with MySql and Oracle and when we use Oracle we need to have the DBA privileges.

List database schemas available on a MySQL server:

$ sqoop list-databases --connect jdbc:mysql://database.example.com/

information_schema

employees

Sqoop-list-tables

It is used to list tables present in the database.

List tables available in the “corp” database:

$ sqoop list-tables --connect jdbc:mysql://database.example.com/corp

employees

payroll_checks

job_descriptions

office_supplies

Important Sqoop Commands:

Top Tutorials

Recent Posts

Cloud Data Analytics: Driving Smarter Business Decisions

Cloud Data Analytics Driving Smarter Business Decisions

Cloud Data Engineering and Analytics: Powering the Future of Data-Driven Decisions

Cloud Data Engineering and Analytics Shaping the Future of Data-Driven Innovation

The Rise of Generative AI in Modern Technology

Work with us

Contact Us