How to use the Apache Sqoop Eval and Codegen tool

Last updated on May 30 2022
Swati Dogra

Table of Contents

How to use the Apache Sqoop Eval and Codegen tool

Sqoop – Codegen

This blog describes the importance of ‘codegen’ tool. From the viewpoint of object-oriented application, every database table has one DAO class that contains ‘getter’ and ‘setter’ methods to initialize objects. This tool (-codegen) generates the DAO class automatically.

It generates DAO class in Java, based on the Table Schema structure. The Java definition is instantiated as a part of the import process. The main usage of this tool is to check if Java lost the Java code. If so, it will create a new version of Java with the default delimiter between fields.

Syntax

The following is the syntax for Sqoop codegen command.

$ sqoop codegen (generic-args) (codegen-args)

$ sqoop-codegen (generic-args) (codegen-args)

Example

Let us take an example that generates Java code for the emp table in the userdb database.

The following command is used to execute the given example.

$ sqoop codegen \

--connect jdbc:mysql://localhost/userdb \

--username root \

--table emp

If the command executes successfully, then it will produce the following output on the terminal.

14/12/23 02:34:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5

14/12/23 02:34:41 INFO tool.CodeGenTool: Beginning code generation

……………….

14/12/23 02:34:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop

Note: /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.java uses or

overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.


14/12/23 02:34:47 INFO orm.CompilationManager: Writing jar file:

   /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/emp.jar

Verification

Let us take a look at the output. The path, which is in bold, is the location that the Java code of the emp table generates and stores. Let us verify the files in that location using the following commands.

$ cd /tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/

$ ls

emp.class

emp.jar

emp.java

If you want to verify in depth, compare the emp table in the userdb database and emp.java in the following directory

/tmp/sqoop-hadoop/compile/9a300a1f94899df4a9b10f9935ed9f91/.

Sqoop – Eval

This blog describes how to use the Sqoop ‘eval’ tool. It allows users to execute user-defined queries against respective database servers and preview the result in the console. So, the user can expect the resultant table data to import. Using eval, we can evaluate any type of SQL query that can be either DDL or DML statement.

Syntax

The following syntax is used for Sqoop eval command.

$ sqoop eval (generic-args) (eval-args)

$ sqoop-eval (generic-args) (eval-args)

Select Query Evaluation

Using eval tool, we can evaluate any type of SQL query. Let us take an example of selecting limited rows in the employee table of db database. The following command is used to evaluate the given example using SQL query.

$ sqoop eval \

--connect jdbc:mysql://localhost/db \

--username root \

--query “SELECT * FROM employee LIMIT 3”

If the command executes successfully, then it will produce the following output on the terminal.

+------+--------------+-------------+-------------------+--------+

| Id   | Name         | Designation | Salary            | Dept   |

+------+--------------+-------------+-------------------+--------+

| 1201 | gopal        | manager     | 50000             | TP     |

| 1202 | manisha      | preader     | 50000             | TP     |

| 1203 | khalil       | php dev     | 30000             | AC     |

+------+--------------+-------------+-------------------+--------+

Insert Query Evaluation

Sqoop eval tool can be applicable for both modeling and defining the SQL statements. That means, we can use eval for insert statements too. The following command is used to insert a new row in the employee table of db database.

$ sqoop eval \

--connect jdbc:mysql://localhost/db \

--username root \

-e “INSERT INTO employee VALUES(1207,‘Raju’,‘UI dev’,15000,‘TP’)”

If the command executes successfully, then it will display the status of the updated rows on the console.

Or else, you can verify the employee table on MySQL console. The following command is used to verify the rows of employee table of db database using select’ query.

mysql>

mysql> use db;

mysql> SELECT * FROM employee;

+------+--------------+-------------+-------------------+--------+

| Id   | Name         | Designation | Salary            | Dept   |

+------+--------------+-------------+-------------------+--------+

| 1201 | gopal        | manager     | 50000             | TP     |

| 1202 | manisha      | preader     | 50000             | TP     |

| 1203 | khalil       | php dev     | 30000             | AC     |

| 1204 | prasanth     | php dev     | 30000             | AC     |

| 1205 | kranthi      | admin       | 20000             | TP     |

| 1206 | satish p     | grp des     | 20000             | GR     |

| 1207 | Raju         | UI dev      | 15000             | TP     |

+------+--------------+-------------+-------------------+--------+

 

So, this brings us to the end of blog. This Tecklearn ‘How to use the Apache Sqoop Eval and Codegen tool’ helps you with commonly asked questions if you are looking out for a job in Apache Sqoop and Big Data Hadoop Developer.

If you wish to learn Sqoop and build a career in Big Data Hadoop domain, then check out our interactive, Big Data Hadoop Spark and Hadoop Developer Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/big-data-spark-and-hadoop-developer/

Big Data Spark and Hadoop Developer Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. In this Big Data course, you will master MapReduce, Hive, Pig, Sqoop, Oozie and Flume, Spark framework and RDD, Scala and Spark SQL, Machine Learning using Spark, Spark Streaming, etc. It is a comprehensive Hadoop Big Data training course designed by industry experts considering current industry job requirements to help you learn Big Data Hadoop and Spark modules. This Cloudera Hadoop and Spark training will prepare you to clear Cloudera CCA175 Big Data certification.

Why Should you take Spark and Hadoop Developer Training?

  • Average salary for a Spark and Hadoop Developer ranges from approximately $106,366 to $127,619 per annum – Indeed.com.
  • Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
  • Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop

What you will Learn in this Course?

Introduction to Hadoop and the Hadoop Ecosystem

  • Problems with Traditional Large-scale Systems
  • Hadoop!
  • The Hadoop Ecosystem

Hadoop Architecture and HDFS

  • Distributed Processing on a Cluster
  • Storage: HDFS Architecture • Storage: Using HDFS
  • Resource Management: YARN Architecture
  • Resource Management: Working with YARN

Importing Relational Data with Apache Sqoop

  • Sqoop Overview
  • Basic Imports and Exports
  • Limiting Results
  • Improving Sqoop’s Performance
  • Sqoop 2

Introduction to Impala and Hive

  • Introduction to Impala and Hive
  • Why Use Impala and Hive?
  • Comparing Hive to Traditional Databases
  • Hive Use Cases

Modelling and Managing Data with Impala and Hive

  • Data Storage Overview
  • Creating Databases and Tables
  • Loading Data into Tables
  • HCatalog
  • Impala Metadata Caching

Data Formats

  • Selecting a File Format
  • Hadoop Tool Support for File Formats
  • Avro Schemas
  • Using Avro with Hive and Sqoop
  • Avro Schema Evolution
  • Compression

Data Partitioning

  • Partitioning Overview
  • Partitioning in Impala and Hive

Capturing Data with Apache Flume

  • What is Apache Flume?
  • Basic Flume Architecture
  • Flume Sources
  • Flume Sinks
  • Flume Channels
  • Flume Configuration

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • RDDs (Resilient Distributed Datasets)
  • Functional Programming in Spark

Working with RDDs in Spark

  • A Closer Look at RDDs
  • Key-Value Pair RDDs
  • MapReduce
  • Other Pair RDD Operations

Writing and Deploying Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the Spark Context
  • Building a Spark Application (Scala and Java)
  • Running a Spark Application
  • The Spark Application Web UI
  • Configuring Spark Properties
  • Logging

Parallel Programming with Spark

  • Review: Spark on a Cluster
  • RDD Partitions
  • Partitioning of File-based RDDs
  • HDFS and Data Locality
  • Executing Parallel Operations
  • Stages and Tasks

Spark Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Common Patterns in Spark Data Processing

  • Common Spark Use Cases
  • Iterative Algorithms in Spark
  • Graph Processing and Analysis
  • Machine Learning
  • Example: k-means

Preview: Spark SQL

  • Spark SQL and the SQL Context
  • Creating DataFrames
  • Transforming and Querying DataFrames
  • Saving DataFrames
  • Comparing Spark SQL with Impala

 

Got a question for us? Please mention it in the comments section and we will get back to you.

 

 

0 responses on "How to use the Apache Sqoop Eval and Codegen tool"

Leave a Message

Your email address will not be published. Required fields are marked *