Explanation of Shell and Utility Commands provided by Apache Grunt Shell

Last updated on May 30 2022
Inderjeet Chopra

Table of Contents

Explanation of Shell and Utility Commands provided by Apache Grunt Shell

Apache Pig – Grunt Shell

After invoking the Grunt shell, you can run your Pig scripts in the shell. In addition to that, there are certain useful shell and utility commands provided by the Grunt shell. This blog explains the shell and utility commands provided by the Grunt shell.
Note − In some portions of this blog, the commands like Load and Store are used.

Shell Commands

The Grunt shell of Apache Pig is mainly used to write Pig Latin scripts. Prior to that, we can invoke any shell commands using sh and fs.
sh Command
Using sh command, we can invoke any shell commands from the Grunt shell. Using sh command from the Grunt shell, we cannot execute the commands that are a part of the shell environment (ex − cd).
Syntax
Given below is the syntax of sh command.
grunt> sh shell command parameters
Example
We can invoke the ls command of Linux shell from the Grunt shell using the sh option as shown below. In this example, it lists out the files in the /pig/bin/ directory.
grunt> sh ls

pig
pig_1444799121955.log
pig.cmd
pig.py
fs Command
Using the fs command, we can invoke any FsShell commands from the Grunt shell.
Syntax
Given below is the syntax of fs command.
grunt> sh File System command parameters
Example
We can invoke the ls command of HDFS from the Grunt shell using fs command. In the following example, it lists the files in the HDFS root directory.
grunt> fs –ls

Found 3 items
drwxrwxrwx – Hadoop supergroup 0 2015-09-08 14:13 Hbase
drwxr-xr-x – Hadoop supergroup 0 2015-09-09 14:52 seqgen_data
drwxr-xr-x – Hadoop supergroup 0 2015-09-08 11:30 twitter_data
In the same way, we can invoke all the other file system shell commands from the Grunt shell using the fs command.

Utility Commands

The Grunt shell provides a set of utility commands. These include utility commands such as clear, help, history, quit, and set; and commands such as exec, kill, and run to control Pig from the Grunt shell. Given below is the description of the utility commands provided by the Grunt shell.
clear Command
The clear command is used to clear the screen of the Grunt shell.
Syntax
You can clear the screen of the grunt shell using the clear command as shown below.
grunt> clear
help Command
The help command gives you a list of Pig commands or Pig properties.
Usage
You can get a list of Pig commands using the help command as shown below.
grunt> help

Commands: <pig latin statement>; – See the PigLatin manual for details:
http://hadoop.apache.org/pig

File system commands:fs <fs arguments> – Equivalent to Hadoop dfs command:
http://hadoop.apache.org/common/docs/current/hdfs_shell.html

Diagnostic Commands:describe <alias>[::<alias] – Show the schema for the alias.
Inner aliases can be described as A::B.
explain [-script <pigscript>] [-out <path>] [-brief] [-dot|-xml]
[-param <param_name>=<pCram_value>]
[-param_file <file_name>] [<alias>] –
Show the execution plan to compute the alias or for entire script.
-script – Explain the entire script.
-out – Store the output into directory rather than print to stdout.
-brief – Don’t expand nested plans (presenting a smaller graph for overview).
-dot – Generate the output in .dot format. Default is text format.
-xml – Generate the output in .xml format. Default is text format.
-param <param_name – See parameter substitution for details.
-param_file <file_name> – See parameter substitution for details.
alias – Alias to explain.
dump <alias> – Compute the alias and writes the results to stdout.

Utility Commands: exec [-param <param_name>=param_value] [-param_file <file_name>] <script> –
Execute the script with access to grunt environment including aliases.
-param <param_name – See parameter substitution for details.
-param_file <file_name> – See parameter substitution for details.
script – Script to be executed.
run [-param <param_name>=param_value] [-param_file <file_name>] <script> –
Execute the script with access to grunt environment.
-param <param_name – See parameter substitution for details.
-param_file <file_name> – See parameter substitution for details.
script – Script to be executed.
sh <shell command> – Invoke a shell command.
kill <job_id> – Kill the hadoop job specified by the hadoop job id.
set <key> <value> – Provide execution parameters to Pig. Keys and values are case sensitive.
The following keys are supported:
default_parallel – Script-level reduce parallelism. Basic input size heuristics used
by default.
debug – Set debug on or off. Default is off.
job.name – Single-quoted name for jobs. Default is PigLatin:<script name>
job.priority – Priority for jobs. Values: very_low, low, normal, high, very_high.
Default is normal stream.skippath – String that contains the path.
This is used by streaming any hadoop property.
help – Display this message.
history [-n] – Display the list statements in cache.
-n Hide line numbers.
quit – Quit the grunt shell.
history Command
This command displays a list of statements executed / used so far since the Grunt sell is invoked.
Usage
Assume we have executed three statements since opening the Grunt shell.
grunt> customers = LOAD ‘hdfs://localhost:9000/pig_data/customers.txt’ USING PigStorage(‘,’);

grunt> orders = LOAD ‘hdfs://localhost:9000/pig_data/orders.txt’ USING PigStorage(‘,’);

grunt> student = LOAD ‘hdfs://localhost:9000/pig_data/student.txt’ USING PigStorage(‘,’);

Then, using the history command will produce the following output.
grunt> history

customers = LOAD ‘hdfs://localhost:9000/pig_data/customers.txt’ USING PigStorage(‘,’);

orders = LOAD ‘hdfs://localhost:9000/pig_data/orders.txt’ USING PigStorage(‘,’);

student = LOAD ‘hdfs://localhost:9000/pig_data/student.txt’ USING PigStorage(‘,’);

set Command
The set command is used to show/assign values to keys used in Pig.
Usage
Using this command, you can set values to the following keys.

Key Description and values
default_parallel You can set the number of reducers for a map job by passing any whole number as a value to this key.
debug You can turn off or turn on the debugging freature in Pig by passing on/off to this key.
job.name You can set the Job name to the required job by passing a string value to this key.
job.priority You can set the job priority to a job by passing one of the following values to this key −

  • very_low
  • low
  • normal
  • high
  • very_high
stream.skippath For streaming, you can set the path from where the data is not to be transferred, by passing the desired path in the form of a string to this key.

quit Command
You can quit from the Grunt shell using this command.
Usage
Quit from the Grunt shell as shown below.
grunt> quit
Let us now take a look at the commands using which you can control Apache Pig from the Grunt shell.
exec Command
Using the exec command, we can execute Pig scripts from the Grunt shell.
Syntax
Given below is the syntax of the utility command exec.
grunt> exec [–param param_name = param_value] [–param_file file_name] [script]
Example
Let us assume there is a file named student.txt in the /pig_data/ directory of HDFS with the following content.
Student.txt
001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi
And, assume we have a script file named sample_script.pig in the /pig_data/ directory of HDFS with the following content.
Sample_script.pig
student = LOAD ‘hdfs://localhost:9000/pig_data/student.txt’ USING PigStorage(‘,’)
as (id:int,name:chararray,city:chararray);

Dump student;
Now, let us execute the above script from the Grunt shell using the exec command as shown below.
grunt> exec /sample_script.pig
Output
The exec command executes the script in the sample_script.pig. As directed in the script, it loads the student.txt file into Pig and gives you the result of the Dump operator displaying the following content.
(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)
kill Command
You can kill a job from the Grunt shell using this command.
Syntax
Given below is the syntax of the kill command.
grunt> kill JobId
Example
Suppose there is a running Pig job having id Id_0055, you can kill it from the Grunt shell using the kill command, as shown below.
grunt> kill Id_0055
run Command
You can run a Pig script from the Grunt shell using the run command
Syntax
Given below is the syntax of the run command.
grunt> run [–param param_name = param_value] [–param_file file_name] script
Example
Let us assume there is a file named student.txt in the /pig_data/ directory of HDFS with the following content.
Student.txt
001,Rajiv,Hyderabad
002,siddarth,Kolkata
003,Rajesh,Delhi
And, assume we have a script file named sample_script.pig in the local filesystem with the following content.
Sample_script.pig
student = LOAD ‘hdfs://localhost:9000/pig_data/student.txt’ USING
PigStorage(‘,’) as (id:int,name:chararray,city:chararray);
Now, let us run the above script from the Grunt shell using the run command as shown below.
grunt> run /sample_script.pig
You can see the output of the script using the Dump operator as shown below.
grunt> Dump;

(1,Rajiv,Hyderabad)
(2,siddarth,Kolkata)
(3,Rajesh,Delhi)
Note − The difference between exec and the run command is that if we use run, the statements from the script are available in the command history.
So, this brings us to the end of blog. This Tecklearn ‘Explanation of Shell and Utility Commands provided by Apache Grunt Shell’ helps you with commonly asked questions if you are looking out for a job in Apache Pig and Big Data Domain.
If you wish to learn Apache Pig and build a career in Apache Pig or Big Data domain, then check out our interactive, Big Data Hadoop Analyst Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

Big Data Hadoop Analyst

Big Data Hadoop Analyst Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. Our Big Data and Hadoop training course lets you deep-dive into the concepts of Big Data, equipping you with the skills required for Hadoop Analyst roles. This course will enable an Analyst to work on Big Data and Hadoop which takes into consideration the burgeoning demands of the industry to process and analyse data at high speeds. This training course will give you the right skills to deploy various tools and techniques to be a Hadoop Analyst working with Big Data.

Why Should you take Hadoop Analyst Training?

• Average salary for a Big Data Hadoop Analyst is $115,819– ZipRecruiter.com.
• Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
• Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop

What you will Learn in this Course?

Hadoop Fundamentals
• The Motivation for Hadoop
• Hadoop Overview
• Data Storage: HDFS
• Distributed Data Processing: YARN, MapReduce, and Spark
• Data Processing and Analysis: Pig, Hive, and Impala
• Data Integration: Sqoop
• Other Hadoop Data Tools
• Exercise Scenarios Explanation
Introduction to Pig
• What Is Pig?
• Pig’s Features
• Pig Use Cases
• Interacting with Pig
Basic Data Analysis with Pig
• Pig Latin Syntax
• Loading Data
• Simple Data Types
• Field Definitions
• Data Output
• Viewing the Schema
• Filtering and Sorting Data
• Commonly-Used Functions
Processing Complex Data with Pig
• Storage Formats
• Complex/Nested Data Types
• Grouping
• Built-In Functions for Complex Data
• Iterating Grouped Data
Multi-Dataset Operations with Pig
• Techniques for Combining Data Sets
• Joining Data Sets in Pig
• Set Operations
• Splitting Data Sets
Pig Troubleshooting and Optimization
• Troubleshooting Pig
• Logging
• Using Hadoop’s Web UI
• Data Sampling and Debugging
• Performance Overview
• Understanding the Execution Plan
• Tips for Improving the Performance of Your Pig Jobs
Introduction to Hive and Impala
• What Is Hive?
• What Is Impala?
• Schema and Data Storage
• Comparing Hive to Traditional Databases
• Hive Use Cases
Querying with Hive and Impala
• Databases and Tables
• Basic Hive and Impala Query Language Syntax
• Data Types
• Differences Between Hive and Impala Query Syntax
• Using Hue to Execute Queries
• Using the Impala Shell
Data Management
• Data Storage
• Creating Databases and Tables
• Loading Data
• Altering Databases and Tables
• Simplifying Queries with Views
• Storing Query Results
Data Storage and Performance
• Partitioning Tables
• Choosing a File Format
• Managing Metadata
• Controlling Access to Data
Relational Data Analysis with Hive and Impala
• Joining Datasets
• Common Built-In Functions
• Aggregation and Windowing
Working with Impala
• How Impala Executes Queries
• Extending Impala with User-Defined Functions
• Improving Impala Performance
Analyzing Text and Complex Data with Hive
• Complex Values in Hive
• Using Regular Expressions in Hive
• Sentiment Analysis and N-Grams
• Conclusion
Hive Optimization
• Understanding Query Performance
• Controlling Job Execution Plan
• Bucketing
• Indexing Data
Extending Hive
• SerDes
• Data Transformation with Custom Scripts
• User-Defined Functions
• Parameterized Queries
Choosing the Best Tool for the Job
• Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "Explanation of Shell and Utility Commands provided by Apache Grunt Shell"

Leave a Message

Your email address will not be published. Required fields are marked *