How to perform pattern matching in Scala and use of Regex expressions

Last updated on May 30 2022
Shakuntala Deskmukh

Table of Contents

How to perform pattern matching in Scala and use of Regex expressions

Scala – Pattern Matching

Pattern matching is the second most widely used feature of Scala, after function values and closures. Scala gives great support for pattern matching, in processing the messages.
A pattern match includes a sequence of alternatives, each starting with the keyword case. Each alternative includes a pattern and one or more expressions, which will be evaluated if the pattern matches. An arrow symbol => separates the pattern from the expressions.
Try the subsequent example program, which shows how to match against an integer value.

Example


object Demo {
def main(args: Array[String]) {
println(matchTest(3))
}

def matchTest(x: Int): String = x match {
case 1 => "one"
case 2 => "two"
case _ => "many"
}
}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.
Command

\>scalac Demo.scala
\>scala Demo

Output
many

The block with the case statements defines a function, which maps integers to strings. The match keyword gives a convenient way of applying a function (like the pattern matching function above) to an object.
Try the subsequent example program, which matches a value against patterns of different types.
Example

object Demo {
def main(args: Array[String]) {
println(matchTest("two"))
println(matchTest("test"))
println(matchTest(1))
}

def matchTest(x: Any): Any = x match {
case 1 => "one"
case "two" => 2
case y: Int => "scala.Int"
case _ => "many"
}
}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.
Command

\>scalac Demo.scala
\>scala Demo

Output
2
many
one

Matching using Case Classes
The case classes are special classes that are used in pattern matching with case expressions. Syntactically, these are standard classes with a special modifier: case.
Try the subsequent, it is a simple pattern matching example using case class.
Example

object Demo {
def main(args: Array[String]) {
val alice = new Person("Alice", 25)
val bob = new Person("Bob", 32)
val charlie = new Person("Charlie", 32)

for (person <- List(alice, bob, charlie)) {
person match {
case Person("Alice", 25) => println("Hi Alice!")
case Person("Bob", 32) => println("Hi Bob!")
case Person(name, age) => println(
"Age: " + age + " year, name: " + name + "?")
}
}
}
case class Person(name: String, age: Int)
}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.
Command

\>scalac Demo.scala
\>scala Demo

Output
Hi Alice!
Hi Bob!
Age: 32 year, name: Charlie?

Adding the case keyword causes the compiler to add a number of useful features automatically. The keyword suggests an association with case expressions in pattern matching.
First, the compiler automatically converts the constructor arguments into immutable fields (vals). The val keyword is optional. If you want mutable fields, use the var keyword. So, our constructor argument lists are now shorter.
Second, the compiler automatically implements equals, hashCode, and toString methods to the class, which use the fields specified as constructor arguments. So, we no longer need our own toString() methods.
Finally, also, the body of Person class becomes empty because there are no methods that we need to define!

Scala – Regular Expressions

This section explains how Scala supports regular expressions through Regex class available in the scala.util.matching package.
Try the subsequent example program where we will try to find out word Scala from a statement.

Example

import scala.util.matching.Regex

object Demo {
def main(args: Array[String]) {
val pattern = "Scala".r
val str = "Scala is Scalable and cool"

println(pattern findFirstIn str)
}
}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.
Command

\>scalac Demo.scala
\>scala Demo

Output
Some(Scala)

We create a String and call the r( ) method on it. Scala implicitly converts the String to a RichString and invokes that method to get an instance of Regex. To find a first match of the regular expression, simply call the findFirstIn() method. If instead of finding only the first occurrence we would like to find all occurrences of the matching word, we can use the findAllIn( ) method and in case there are multiple Scala words available in the target string, this will return a collection of all matching words.
You can make use of the mkString( ) method to concatenate the resulting list and you can use a pipe (|) to search small and capital case of Scala and you can use Regex constructor instead or r() method to create a pattern.
Try the subsequent example program.

Example

import scala.util.matching.Regex

object Demo {
def main(args: Array[String]) {
val pattern = new Regex("(S|s)cala")
val str = "Scala is scalable and cool"

println((pattern findAllIn str).mkString(","))
}
}
Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.
Command
\>scalac Demo.scala
\>scala Demo

Output

Scala,scala

If you would like to replace matching text, we can use replaceFirstIn( ) to replace the first match or replaceAllIn( ) to replace all occurrences.

Example

object Demo {
def main(args: Array[String]) {
val pattern = "(S|s)cala".r
val str = "Scala is scalable and cool"

println(pattern replaceFirstIn(str, "Java"))
}

}

Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.

Command

\>scalac Demo.scala
\>scala Demo

Output

Java is scalable and cool
Forming Regular Expressions
Scala inherits its regular expression syntax from Java, which in turn inherits most of the features of Perl. Here are just some examples that should be enough as refreshers −
Subsequent is the table listing down all the regular expression Meta character syntax available in Java.
Subexpression Matches
^ Matches beginning of line.
$ Matches end of line.
. Matches any single character except newline. Using m option allows it to match newline as well.
[…] Matches any single character in brackets.
[^…] Matches any single character not in brackets
\\A Beginning of entire string
\\z End of entire string
\\Z End of entire string except allowable final line terminator.
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more of the previous thing
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a|b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
(?: re) Groups regular expressions without remembering matched text.
(?> re) Matches independent pattern without backtracking.
\\w Matches word characters.
\\W Matches nonword characters.
\\s Matches whitespace. Equivalent to [\t\n\r\f].
\\S Matches nonwhitespace.
\\d Matches digits. Equivalent to [0-9].
\\D Matches nondigits.
\\A Matches beginning of string.
\\Z Matches end of string. If a newline exists, it matches just before newline.
\\z Matches end of string.
\\G Matches point where last match finished.
\\n Back-reference to capture group number “n”
\\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\\B Matches nonword boundaries.
\\n, \\t, etc. Matches newlines, carriage returns, tabs, etc.
\\Q Escape (quote) all characters up to \\E
\\E Ends quoting begun with \\Q
Regular-Expression Examples
Example Description
. Match any character except newline
[Rr]uby Match “Ruby” or “ruby”
rub[ye] Match “ruby” or “rube”
[aeiou] Match any one lowercase vowel
[0-9] Match any digit; equivalent as [0123456789]
[a-z] Match any lowercase ASCII letter
[A-Z] Match any uppercase ASCII letter
[a-zA-Z0-9] Match any of the above
[^aeiou] Match anything other than a lowercase vowel
[^0-9] Match anything other than a digit
\\d Match a digit: [0-9]
\\D Match a nondigit: [^0-9]
\\s Match a whitespace character: [ \t\r\n\f]
\\S Match nonwhitespace: [^ \t\r\n\f]
\\w Match a single word character: [A-Za-z0-9_]
\\W Match a nonword character: [^A-Za-z0-9_]
ruby? Match “rub” or “ruby”: the y is optional
ruby* Match “rub” plus 0 or more ys
ruby+ Match “rub” plus 1 or more ys
\\d{3} Match exactly 3 digits
\\d{3,} Match 3 or more digits
\\d{3,5} Match 3, 4, or 5 digits
\\D\\d+ No group: + repeats \\d
(\\D\\d)+/ Grouped: + repeats \\D\d pair
([Rr]uby(, )?)+ Match “Ruby”, “Ruby, ruby, ruby”, etc.
Note − that every backslash appears twice in the string above. This is because in Java and Scala a single backslash is an escape character in a string literal, not a regular character that shows up in the string. So instead of ‘\’, you need to write ‘\\’ to get a single backslash in the string.
Try the subsequent example program.
Example

import scala.util.matching.Regex

object Demo {
def main(args: Array[String]) {
val pattern = new Regex("abl[ae]\\d+")
val str = "ablaw is able1 and cool"

println((pattern findAllIn str).mkString(","))
}
}


Save the above program in Demo.scala. The subsequent commands are used to compile and execute this program.
Command
\>scalac Demo.scala
\>scala Demo

Output

able1

So, this brings us to the end of blog. This Tecklearn ‘How to perform pattern matching in Scala and use of Regex expressions’ helps you with commonly asked questions if you are looking out for a job in Apache Spark and Scala and Big Data Developer. If you wish to learn Apache Spark and Scala and build a career in Big Data Hadoop domain, then check out our interactive, Apache Spark and Scala Training, that comes with 24*7 support to guide you throughout your learning period. Please find the link for course details:

Apache Spark and Scala Certification

Apache Spark and Scala Training

About the Course

Tecklearn Spark training lets you master real-time data processing using Spark streaming, Spark SQL, Spark RDD and Spark Machine Learning libraries (Spark MLlib). This Spark certification training helps you master the essential skills of the Apache Spark open-source framework and Scala programming language, including Spark Streaming, Spark SQL, machine learning programming, GraphX programming, and Shell Scripting Spark. You will also understand the role of Spark in overcoming the limitations of MapReduce. Upon completion of this online training, you will hold a solid understanding and hands-on experience with Apache Spark.

Why Should you take Apache Spark and Scala Training?

• The average salary for Apache Spark developer ranges from approximately $93,486 per year for Developer to $128,313 per year for Data Engineer. – Indeed.com
• Wells Fargo, Microsoft, Capital One, Apple, JPMorgan Chase & many other MNC’s worldwide use Apache Spark across industries.
• Global Spark market revenue will grow to $4.2 billion by 2022 with a CAGR of 67% Marketanalysis.com

What you will Learn in this Course?

Introduction to Scala for Apache Spark

• What is Scala
• Why Scala for Spark
• Scala in other Frameworks
• Scala REPL
• Basic Scala Operations
• Variable Types in Scala
• Control Structures in Scala
• Loop, Functions and Procedures
• Collections in Scala
• Array Buffer, Map, Tuples, Lists

Functional Programming and OOPs Concepts in Scala

• Functional Programming
• Higher Order Functions
• Anonymous Functions
• Class in Scala
• Getters and Setters
• Custom Getters and Setters
• Constructors in Scala
• Singletons
• Extending a Class using Method Overriding

Introduction to Spark

• Introduction to Spark
• How Spark overcomes the drawbacks of MapReduce
• Concept of In Memory MapReduce
• Interactive operations on MapReduce
• Understanding Spark Stack
• HDFS Revision and Spark Hadoop YARN
• Overview of Spark and Why it is better than Hadoop
• Deployment of Spark without Hadoop
• Cloudera distribution and Spark history server

Basics of Spark

• Spark Installation guide
• Spark configuration and memory management
• Driver Memory Versus Executor Memory
• Working with Spark Shell
• Resilient distributed datasets (RDD)
• Functional programming in Spark and Understanding Architecture of Spark
Playing with Spark RDDs
• Challenges in Existing Computing Methods
• Probable Solution and How RDD Solves the Problem
• What is RDD, It’s Operations, Transformations & Actions Data Loading and Saving Through RDDs
• Key-Value Pair RDDs
• Other Pair RDDs and Two Pair RDDs
• RDD Lineage
• RDD Persistence
• Using RDD Concepts Write a Wordcount Program
• Concept of RDD Partitioning and How It Helps Achieve Parallelization
• Passing Functions to Spark

Writing and Deploying Spark Applications

• Creating a Spark application using Scala or Java
• Deploying a Spark application
• Scala built application
• Creating application using SBT
• Deploying application using Maven
• Web user interface of Spark application
• A real-world example of Spark and configuring of Spark

Parallel Processing

• Concept of Spark parallel processing
• Overview of Spark partitions
• File Based partitioning of RDDs
• Concept of HDFS and data locality
• Technique of parallel operations
• Comparing coalesce and Repartition and RDD actions

Machine Learning using Spark MLlib

• Why Machine Learning
• What is Machine Learning
• Applications of Machine Learning
• Face Detection: USE CASE
• Machine Learning Techniques
• Introduction to MLlib
• Features of MLlib and MLlib Tools
• Various ML algorithms supported by MLlib

Integrating Apache Flume and Apache Kafka

• Why Kafka, what is Kafka and Kafka architecture
• Kafka workflow and Configuring Kafka cluster
• Basic operations and Kafka monitoring tools
• Integrating Apache Flume and Apache Kafka

Apache Spark Streaming

• Why Streaming is Necessary
• What is Spark Streaming
• Spark Streaming Features
• Spark Streaming Workflow
• Streaming Context and DStreams
• Transformations on DStreams
• Describe Windowed Operators and Why it is Useful
• Important Windowed Operators
• Slice, Window and ReduceByWindow Operators
• Stateful Operators

Improving Spark Performance

• Learning about accumulators
• The common performance issues and troubleshooting the performance problems

DataFrames and Spark SQL

• Need for Spark SQL
• What is Spark SQL
• Spark SQL Architecture
• SQL Context in Spark SQL
• User Defined Functions
• Data Frames and Datasets
• Interoperating with RDDs
• JSON and Parquet File Formats
• Loading Data through Different Sources

Scheduling and Partitioning in Apache Spark

• Concept of Scheduling and Partitioning in Spark
• Hash partition and range partition
• Scheduling applications
• Static partitioning and dynamic sharing
• Concept of Fair scheduling
• Map partition with index and Zip
• High Availability• Single-node Recovery with Local File System and High Order Functions

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "How to perform pattern matching in Scala and use of Regex expressions"

Leave a Message

Your email address will not be published. Required fields are marked *