String Handling in R Language

Last updated on Dec 14 2021
R Deskmukh

Table of Contents

String Handling in R Language

Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string within double quotes, even when you create them with single quote.

Rules Applied in String Construction

• The quotes at the beginning and end of a string should be both double quotes or both single quote. They can not be mixed.
• Double quotes can be inserted into a string starting and ending with single quote.
• Single quote can be inserted into a string starting and ending with double quotes.
• Double quotes can not be inserted into a string starting and ending with double quotes.
• Single quote can not be inserted into a string starting and ending with single quote.
Examples of Valid Strings
Following examples clarify the rules about creating a string in R.

a <- 'Start and end with single quote'
print(a)
b <- "Start and end with double quotes"
print(b)
c <- "single quote ' in between double quotes"
print(c)
d <- 'Double quotes " in between single quote'
print(d)

When the above code is run we get the following output −
[1] “Start and end with single quote”
[1] “Start and end with double quotes”
[1] “single quote ‘ in between double quote”
[1] “Double quote \” in between single quote”
Examples of Invalid Strings

e <- 'Mixed quotes" 
print(e)

f <- 'Single quote ' inside single quote'
print(f)

g <- "Double quotes " inside double quotes"
print(g)
When we run the script it fails giving below results.
Error: unexpected symbol in:
"print(e)
f <- 'Single"
Execution halted

String Manipulation

Concatenating Strings – paste() function
Many strings in R are combined using the paste() function. It can take any number of arguments to be combined together.
Syntax
The basic syntax for paste function is −
paste(…, sep = ” “, collapse = NULL)
Following is the description of the parameters used −
• … represents any number of arguments to be combined.
• sep represents any separator between the arguments. It is optional.
• collapse is used to eliminate the space in between two strings. But not the space within two words of one string.
Example

a <- "Hello"
b <- 'How'
c <- "are you? "

print(paste(a,b,c))

print(paste(a,b,c, sep = "-"))

print(paste(a,b,c, sep = "", collapse = ""))
When we execute the above code, it produces the following result −
[1] "Hello How are you? "
[1] "Hello-How-are you? "
[1] "HelloHoware you? "

Formatting numbers & strings – format() function

Numbers and strings can be formatted to a specific style using format() function.
Syntax
The basic syntax for format function is −
format(x, digits, nsmall, scientific, width, justify = c(“left”, “right”, “centre”, “none”))
Following is the description of the parameters used −
• x is the vector input.
• digits is the total number of digits displayed.
• nsmall is the minimum number of digits to the right of the decimal point.
• scientific is set to TRUE to display scientific notation.
• width indicates the minimum width to be displayed by padding blanks in the beginning.
• justify is the display of the string to left, right or center.
Example

# Total number of digits displayed. Last digit rounded off.
result <- format(23.123456789, digits = 9)
print(result)

# Display numbers in scientific notation.
result <- format(c(6, 13.14521), scientific = TRUE)
print(result)

# The minimum number of digits to the right of the decimal point.
result <- format(23.47, nsmall = 5)
print(result)

# Format treats everything as a string.
result <- format(6)
print(result)

# Numbers are padded with blank in the beginning for width.
result <- format(13.7, width = 6)
print(result)

# Left justify strings.
result <- format("Hello", width = 8, justify = "l")
print(result)

# Justfy string with center.
result <- format("Hello", width = 8, justify = "c")
print(result)
When we execute the above code, it produces the following result −
[1] "23.1234568"
[1] "6.000000e+00" "1.314521e+01"
[1] "23.47000"
[1] "6"
[1] " 13.7"
[1] "Hello "
[1] " Hello "

Counting number of characters in a string – nchar() function
This function counts the number of characters including spaces in a string.
Syntax
The basic syntax for nchar() function is −
nchar(x)
Following is the description of the parameters used −
• x is the vector input.
Example

result <- nchar(“Count the number of characters”)
print(result)
When we execute the above code, it produces the following result −
[1] 30

Changing the case – toupper() & tolower() functions

These functions change the case of characters of a string.
Syntax
The basic syntax for toupper() & tolower() function is −
toupper(x)
tolower(x)
Following is the description of the parameters used −
• x is the vector input.
Example

# Changing to Upper case.
result <- toupper(“Changing To Upper”)
print(result)

# Changing to lower case.
result <- tolower(“Changing To Lower”)
print(result)
When we execute the above code, it produces the following result −
[1] “CHANGING TO UPPER”
[1] “changing to lower”

Extracting parts of a string – substring() function
This function extracts parts of a String.
Syntax
The basic syntax for substring() function is −
substring(x,first,last)
Following is the description of the parameters used −
• x is the character vector input.
• first is the position of the first character to be extracted.
• last is the position of the last character to be extracted.
Example

# Extract characters from 5th to 7th position.
result <- substring(“Extract”, 5, 7)
print(result)
When we execute the above code, it produces the following result −
[1] “act”

So, this brings us to the end of blog. This Tecklearn ‘String Handling in R Language’ blog helps you with commonly asked questions if you are looking out for a job in Data Science. If you wish to learn R Language and build a career in Data Science domain, then check out our interactive, Data Science using R Language Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

https://www.tecklearn.com/course/data-science-training-using-r-language/

Data Science using R Language Training

About the Course

Tecklearn’s Data Science using R Language Training develops knowledge and skills to visualize, transform, and model data in R language. It helps you to master the Data Science with R concepts such as data visualization, data manipulation, machine learning algorithms, charts, hypothesis testing, etc. through industry use cases, and real-time examples. Data Science course certification training lets you master data analysis, R statistical computing, connecting R with Hadoop framework, Machine Learning algorithms, time-series analysis, K-Means Clustering, Naïve Bayes, business analytics and more. This course will help you gain hands-on experience in deploying Recommender using R, Evaluation, Data Transformation etc.

Why Should you take Data Science Using R Training?

• The Average salary of a Data Scientist in R is $123k per annum – Glassdoor.com
• A recent market study shows that the Data Analytics Market is expected to grow at a CAGR of 30.08% from 2020 to 2023, which would equate to $77.6 billion.
• IBM, Amazon, Apple, Google, Facebook, Microsoft, Oracle & other MNCs worldwide are using data science for their Data analysis.

What you will Learn in this Course?

Introduction to Data Science
• Need for Data Science
• What is Data Science
• Life Cycle of Data Science
• Applications of Data Science
• Introduction to Big Data
• Introduction to Machine Learning
• Introduction to Deep Learning
• Introduction to R&R-Studio
• Project Based Data Science
Introduction to R
• Introduction to R
• Data Exploration
• Operators in R
• Inbuilt Functions in R
• Flow Control Statements & User Defined Functions
• Data Structures in R
Data Manipulation
• Need for Data Manipulation
• Introduction to dplyr package
• Select (), filter(), mutate(), sample_n(), sample_frac() & count() functions
• Getting summarized results with the summarise() function,
• Combining different functions with the pipe operator
• Implementing sql like operations with sqldf()
Visualization of Data
• Loading different types of datasets in R
• Arranging the data
• Plotting the graphs
Introduction to Statistics
• Types of Data
• Probability
• Correlation and Co-variance
• Hypothesis Testing
• Standardization and Normalization
Introduction to Machine Learning
• What is Machine Learning?
• Machine Learning Use-Cases
• Machine Learning Process Flow
• Machine Learning Categories
• Supervised Learning algorithm: Linear Regression and Logistic Regression
Logistic Regression
• Intro to Logistic Regression
• Simple Logistic Regression in R
• Multiple Logistic Regression in R
• Confusion Matrix
• ROC Curve
Classification Techniques
• What are classification and its use cases?
• What is Decision Tree?
• Algorithm for Decision Tree Induction
• Creating a Perfect Decision Tree
• Confusion Matrix
• What is Random Forest?
• What is Naive Bayes?
• Support Vector Machine: Classification
Decision Tree
• Decision Tree in R
• Information Gain
• Gini Index
• Pruning
Recommender Engines
• What is Association Rules & its use cases?
• What is Recommendation Engine & it’s working?
• Types of Recommendations
• User-Based Recommendation
• Item-Based Recommendation
• Difference: User-Based and Item-Based Recommendation
• Recommendation use cases
Time Series Analysis
• What is Time Series data?
• Time Series variables
• Different components of Time Series data
• Visualize the data to identify Time Series Components
• Implement ARIMA model for forecasting
• Exponential smoothing models
• Identifying different time series scenario based on which different Exponential Smoothing model can be applied

Got a question for us? Please mention it in the comments section and we will get back to you.

 

0 responses on "String Handling in R Language"

Leave a Message

Your email address will not be published. Required fields are marked *