Null behaviour in Sqoop and Hive

Sqoop by default imports the null values with string null as output.

So any record having null in data would be shown like below in HDFS

|N|null|Jagat Singh|BigData

The issue with such kind of import is that we cannot write hive queries like show me all records where the column is not null.

To know better we should keep in mind that the Hive default representation for null is

\N

So to make Hive treat the record as null should be imported as \N

Read link below


the corresponding Sqoop query to make such kind of import work would be

sqoop import  …--null-string '\\N' --null-non-string '\\N'

We have to escape the \N parameter properly


Note
 \N (backslash with a capital N)—not to be confused with \n, which is only a single newline character.

Hope this helps

1 comment:

Please share your views and comments below.

Thank You.