This blog describes the importance of ‘codegen’ tool. From the viewpoint of object-oriented application, every database table has one DAO class that contains ‘getter’ and ‘setter’ methods to initialize objects. This tool (-codegen) generates the DAO class automatically.

It generates DAO class in Java, based on the Table Schema structure. The Java definition is instantiated as a part of the import process. The main usage of this tool is to check if Java lost the Java code. If so, it will create a new version of Java with the default delimiter between fields.

Syntax

The following is the syntax for Sqoop codegen command.

$ sqoop codegen (generic-args) (codegen-args)

$ sqoop-codegen (generic-args) (codegen-args)

Example

Let us take an example that generates Java code for the emp table in the userdb database.

The following command is used to execute the given example.

$ sqoop codegen \

--connect jdbc:mysql://localhost/userdb \

--username root \

--table emp

If the command executes successfully, then it will produce the following output on the terminal.

14/12/23 02:34:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

14/12/23 02:34:41 INFO tool.CodeGenTool: Beginning code generation

……………….

14/12/23 02:34:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop

Note: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.java uses or

overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.


14/12/23 02:34:47 INFO orm.CompilationManager: Writing jar file:

   /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.jar

Verification

Let us take a look at the output. The path, which is in bold, is the location that the Java code of the emp table generates and stores. Let us verify the files in that location using the following commands.

$ cd /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/

$ ls

emp.class

emp.jar

emp.java

If you want to verify in depth, compare the emp table in the userdb database and emp.java in the following directory

/tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/.

Sqoop – Eval

This blog describes how to use the Sqoop ‘eval’ tool. It allows users to execute user-defined queries against respective database servers and preview the result in the console. So, the user can expect the resultant table data to import. Using eval, we can evaluate any type of SQL query that can be either DDL or DML statement.

Syntax

The following syntax is used for Sqoop eval command.

$ sqoop eval (generic-args) (eval-args)

$ sqoop-eval (generic-args) (eval-args)

Select Query Evaluation

Using eval tool, we can evaluate any type of SQL query. Let us take an example of selecting limited rows in the employee table of db database. The following command is used to evaluate the given example using SQL query.

$ sqoop eval \

--connect jdbc:mysql://localhost/db \

--username root \

--query “SELECT * FROM employee LIMIT 3”

If the command executes successfully, then it will produce the following output on the terminal.

+------+--------------+-------------+-------------------+--------+

| Id   | Name         | Designation | Salary            | Dept   |

+------+--------------+-------------+-------------------+--------+

| 1201 | gopal        | manager     | 50000             | TP     |

| 1202 | manisha      | preader     | 50000             | TP     |

| 1203 | khalil       | php dev     | 30000             | AC     |

+------+--------------+-------------+-------------------+--------+

Insert Query Evaluation

Sqoop eval tool can be applicable for both modeling and defining the SQL statements. That means, we can use eval for insert statements too. The following command is used to insert a new row in the employee table of db database.

$ sqoop eval \

--connect jdbc:mysql://localhost/db \

--username root \

-e “INSERT INTO employee VALUES(1207,‘Raju’,‘UI dev’,15000,‘TP’)”

If the command executes successfully, then it will display the status of the updated rows on the console.

Or else, you can verify the employee table on MySQL console. The following command is used to verify the rows of employee table of db database using select’ query.

mysql>

mysql> use db;

mysql> SELECT * FROM employee;

+------+--------------+-------------+-------------------+--------+

| Id   | Name         | Designation | Salary            | Dept   |

+------+--------------+-------------+-------------------+--------+

| 1201 | gopal        | manager     | 50000             | TP     |

| 1202 | manisha      | preader     | 50000             | TP     |

| 1203 | khalil       | php dev     | 30000             | AC     |

| 1204 | prasanth     | php dev     | 30000             | AC     |

| 1205 | kranthi      | admin       | 20000             | TP     |

| 1206 | satish p     | grp des     | 20000             | GR     |

| 1207 | Raju         | UI dev      | 15000             | TP     |

+------+--------------+-------------+-------------------+--------+

So, this brings us to the end of blog. This Tecklearn ‘How to use the Apache Sqoop Eval and Codegen tool’ helps you with commonly asked questions if you are looking out for a job in Apache Sqoop and Big Data Hadoop Developer.

If you wish to learn Sqoop and build a career in Big Data Hadoop domain, then check out our interactive, Big Data Hadoop Spark and Hadoop Developer Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/big-data-spark-and-hadoop-developer/

Big Data Spark and Hadoop Developer Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. In this Big Data course, you will master MapReduce, Hive, Pig, Sqoop, Oozie and Flume, Spark framework and RDD, Scala and Spark SQL, Machine Learning using Spark, Spark Streaming, etc. It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to help you learn Big Data Hadoop and Spark modules. This Cloudera Hadoop and Spark training will prepare you to clear Cloudera CCA175 Big Data certification.

Why Should you take Spark and Hadoop Developer Training?

Average salary for a Spark and Hadoop Developer ranges from approximately $106,366 to $127,619 per annum – Indeed.com.
Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop

What you will Learn in this Course?

Introduction to Hadoop and the Hadoop Ecosystem

Problems with Traditional Large-scale Systems
Hadoop!
The Hadoop Ecosystem

Hadoop Architecture and HDFS

Distributed Processing on a Cluster
Storage: HDFS Architecture • Storage: Using HDFS
Resource Management: YARN Architecture
Resource Management: Working with YARN

Importing Relational Data with Apache Sqoop

Sqoop Overview
Basic Imports and Exports
Limiting Results
Improving Sqoop’s Performance
Sqoop 2

Introduction to Impala and Hive

Introduction to Impala and Hive
Why Use Impala and Hive?
Comparing Hive to Traditional Databases
Hive Use Cases

Modelling and Managing Data with Impala and Hive

Data Storage Overview
Creating Databases and Tables
Loading Data into Tables
HCatalog
Impala Metadata Caching

Data Formats

Selecting a File Format
Hadoop Tool Support for File Formats
Avro Schemas
Using Avro with Hive and Sqoop
Avro Schema Evolution
Compression

Data Partitioning

Partitioning Overview
Partitioning in Impala and Hive

Capturing Data with Apache Flume

What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Channels
Flume Configuration

Spark Basics

What is Apache Spark?
Using the Spark Shell
RDDs (Resilient Distributed Datasets)
Functional Programming in Spark

Working with RDDs in Spark

A Closer Look at RDDs
Key-Value Pair RDDs
MapReduce
Other Pair RDD Operations

Writing and Deploying Spark Applications

Spark Applications vs. Spark Shell
Creating the Spark Context
Building a Spark Application (Scala and Java)
Running a Spark Application
The Spark Application Web UI
Configuring Spark Properties
Logging

Parallel Programming with Spark

Review: Spark on a Cluster
RDD Partitions
Partitioning of File-based RDDs
HDFS and Data Locality
Executing Parallel Operations
Stages and Tasks

Spark Caching and Persistence

RDD Lineage
Caching Overview
Distributed Persistence

Common Patterns in Spark Data Processing

Common Spark Use Cases
Iterative Algorithms in Spark
Graph Processing and Analysis
Machine Learning
Example: k-means

Preview: Spark SQL

Spark SQL and the SQL Context
Creating DataFrames
Transforming and Querying DataFrames
Saving DataFrames
Comparing Spark SQL with Impala

Got a question for us? Please mention it in the comments section and we will get back to you.

539