Incremental Imports:
We have an option called Incremental imports which can be used to import only the rows which have been changed since previous Import. Sqoop provides 2 types of import-append and lastmodified.
Append
Here we have 2 arguments,’—check-column’ and ‘—last-value’. We will use append mode when new rows are continually added with increasing row id. First we check the row id with ‘—check-column’, sqoop will import the rows when check column has a value greater than last value.
LastModified
Here the timestamp of the rows is checked, rows having the check column timestamp more than of last value timestamp are imported. We will use this method when rows of the table are updated and each update will set the last value to the current timestamp.
Performing an incremental import of new data, after having already imported 4 rows of a table:
First we will insert some value in employee table and will append that in HDFS using incremental import.
mysql> select * from employee; +------+---------+ | id | name | +------+---------+ | 1 | sai | | 2 | kishore | | 3 | chandu | | 4 | gopal | +------+---------+ 4 rows in set (0.00 sec) mysql> insert into employee values(5,'nanda'); Query OK, 1 row affected (0.05 sec) mysql> select * from employee; +------+---------+ | id | name | +------+---------+ | 1 | sai | | 2 | kishore | | 3 | chandu | | 4 | gopal | | 5 | nanda | +------+---------+ 5 rows in set (0.00 sec)
Now run the incremental import.
hdadmin@ubuntu:~/sqoop-1.4.5-cdh5.3.2$ bin/sqoop import -connect jdbc:mysql://localhost:3306/mysql -username root -password **** --table employee --where "id > 4" --target-dir /sqoop01 --append -m 1 hdadmin@ubuntu:~/sqoop-1.4.5-cdh5.3.2$ hdfs dfs -cat /sqoop01/part-m-00001 5,nanda