How to Store data in Apache Pig using Store Operator

Last updated on May 30 2022
Inderjeet Chopra

Table of Contents

How to Store data in Apache Pig using Store Operator

Apache Pig – Storing Data

You can store the loaded data in the file system using the store operator. This blog explains how to store data in Apache Pig using the Store operator.
Syntax
Given below is the syntax of the Store statement.
STORE Relation_name INTO ‘ required_directory_path ‘ [USING function];
Example
Assume we have a file student_data.txt in HDFS with the following content.
001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata
003,Rajesh,Khanna,9848022339,Delhi
004,Preethi,Agarwal,9848022330,Pune
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai.
And we have read it into a relation student using the LOAD operator as shown below.
grunt> student = LOAD ‘hdfs://localhost:9000/pig_data/student_data.txt’
USING PigStorage(‘,’)
as ( id:int, firstname:chararray, lastname:chararray, phone:chararray,
city:chararray );
Now, let us store the relation in the HDFS directory “/pig_Output/” as shown below.
grunt> STORE student INTO ‘ hdfs://localhost:9000/pig_Output/ ‘ USING PigStorage (‘,’);
Output
After executing the store statement, you will get the following output. A directory is created with the specified name and the data will be stored in it.
2015-10-05 13:05:05,429 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
MapReduceLau ncher – 100% complete
2015-10-05 13:05:05,429 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats –
Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.6.0 0.15.0 Hadoop 2015-10-0 13:03:03 2015-10-05 13:05:05 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime
job_14459_06 1 0 n/a n/a n/a n/a
MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature
0 0 0 0 student MAP_ONLY
OutPut folder
hdfs://localhost:9000/pig_Output/

Input(s): Successfully read 0 records from: “hdfs://localhost:9000/pig_data/student_data.txt”
Output(s): Successfully stored 0 records in: “hdfs://localhost:9000/pig_Output”
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG: job_1443519499159_0006

2015-10-05 13:06:06,192 [main] INFO org.apache.pig.backend.hadoop.executionengine
.mapReduceLayer.MapReduceLau ncher – Success!

Verification

You can verify the stored data as shown below.
Step 1
First of all, list out the files in the directory named pig_output using the ls command as shown below.
hdfs dfs -ls ‘hdfs://localhost:9000/pig_Output/’
Found 2 items
rw-r–r- 1 Hadoop supergroup 0 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/_SUCCESS
rw-r–r- 1 Hadoop supergroup 224 2015-10-05 13:03 hdfs://localhost:9000/pig_Output/part-m-00000
You can observe that two files were created after executing the store statement.
Step 2
Using cat command, list the contents of the file named part-m-00000 as shown below.
$ hdfs dfs -cat ‘hdfs://localhost:9000/pig_Output/part-m-00000’
1,Rajiv,Reddy,9848022337,Hyderabad
2,siddarth,Battacharya,9848022338,Kolkata
3,Rajesh,Khanna,9848022339,Delhi
4,Preethi,Agarwal,9848022330,Pune
5,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
6,Archana,Mishra,9848022335,Chennai

So, this brings us to the end of blog. This Tecklearn ‘How to Store data in Apache Pig using Store Operator’ helps you with commonly asked questions if you are looking out for a job in Apache Pig and Big Data Domain.
If you wish to learn Apache Pig and build a career in Apache Pig or Big Data domain, then check out our interactive, Big Data Hadoop Analyst Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

Big Data Hadoop Analyst

Big Data Hadoop Analyst Training

About the Course

Big Data analysis is emerging as a key advantage in business intelligence for many organizations. Our Big Data and Hadoop training course lets you deep-dive into the concepts of Big Data, equipping you with the skills required for Hadoop Analyst roles. This course will enable an Analyst to work on Big Data and Hadoop which takes into consideration the burgeoning demands of the industry to process and analyse data at high speeds. This training course will give you the right skills to deploy various tools and techniques to be a Hadoop Analyst working with Big Data.

Why Should you take Hadoop Analyst Training?

• Average salary for a Big Data Hadoop Analyst is $115,819– ZipRecruiter.com.
• Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 – Forbes.
• Amazon, Cloudera, Data Stax, DELL, EMC2, IBM, Microsoft & other MNCs worldwide use Hadoop

What you will Learn in this Course?

Hadoop Fundamentals
• The Motivation for Hadoop
• Hadoop Overview
• Data Storage: HDFS
• Distributed Data Processing: YARN, MapReduce, and Spark
• Data Processing and Analysis: Pig, Hive, and Impala
• Data Integration: Sqoop
• Other Hadoop Data Tools
• Exercise Scenarios Explanation
Introduction to Pig
• What Is Pig?
• Pig’s Features
• Pig Use Cases
• Interacting with Pig
Basic Data Analysis with Pig
• Pig Latin Syntax
• Loading Data
• Simple Data Types
• Field Definitions
• Data Output
• Viewing the Schema
• Filtering and Sorting Data
• Commonly-Used Functions
Processing Complex Data with Pig
• Storage Formats
• Complex/Nested Data Types
• Grouping
• Built-In Functions for Complex Data
• Iterating Grouped Data
Multi-Dataset Operations with Pig
• Techniques for Combining Data Sets
• Joining Data Sets in Pig
• Set Operations
• Splitting Data Sets
Pig Troubleshooting and Optimization
• Troubleshooting Pig
• Logging
• Using Hadoop’s Web UI
• Data Sampling and Debugging
• Performance Overview
• Understanding the Execution Plan
• Tips for Improving the Performance of Your Pig Jobs
Introduction to Hive and Impala
• What Is Hive?
• What Is Impala?
• Schema and Data Storage
• Comparing Hive to Traditional Databases
• Hive Use Cases
Querying with Hive and Impala
• Databases and Tables
• Basic Hive and Impala Query Language Syntax
• Data Types
• Differences Between Hive and Impala Query Syntax
• Using Hue to Execute Queries
• Using the Impala Shell
Data Management
• Data Storage
• Creating Databases and Tables
• Loading Data
• Altering Databases and Tables
• Simplifying Queries with Views
• Storing Query Results
Data Storage and Performance
• Partitioning Tables
• Choosing a File Format
• Managing Metadata
• Controlling Access to Data
Relational Data Analysis with Hive and Impala
• Joining Datasets
• Common Built-In Functions
• Aggregation and Windowing
Working with Impala
• How Impala Executes Queries
• Extending Impala with User-Defined Functions
• Improving Impala Performance
Analyzing Text and Complex Data with Hive
• Complex Values in Hive
• Using Regular Expressions in Hive
• Sentiment Analysis and N-Grams
• Conclusion
Hive Optimization
• Understanding Query Performance
• Controlling Job Execution Plan
• Bucketing
• Indexing Data
Extending Hive
• SerDes
• Data Transformation with Custom Scripts
• User-Defined Functions
• Parameterized Queries
Choosing the Best Tool for the Job
• Comparing MapReduce, Pig, Hive, Impala, and Relational Databases

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "How to Store data in Apache Pig using Store Operator"

Leave a Message

Your email address will not be published. Required fields are marked *