Java

Java – Regular Expressions Tutorial With examples

String patterns that may be utilized for text searches, manipulation, and editing are defined using regular expressions. These are also referred to as Regex expressions (short form of Regular expressions).

Lets take an example to understand it better:

The regular expression.*book.* is used in the example below to look for the word “book” anywhere in the text.

import java.util.regex.*;  
class RegexExample1{  
   public static void main(String args[]){  
      String content = "This is Chaitanya " +
	    "from Beginnersbook.com.";

      String pattern = ".*book.*";

      boolean isMatch = Pattern.matches(pattern, content);
      System.out.println("The text contains 'book'? " + isMatch);
   }
}

Output:

The text contains 'book'? true

We will learn how to define patterns and how to utilize them in this lesson. The two primary classes in the java.util.regex API (the package we must import when working with Regex) are:

1) Patterns are defined using java.util.regex.Pattern.
2) Java.util.regex.Matcher – Used for pattern-based text match operations.

java.util.regex.Pattern class:

1) Pattern.matches()

This method’s use was previously seen in the example up above, where we searched for the word “book” inside a supplied text. This is one of the simplest and quickest ways to use Regex to find a String inside a text.

String content = "This is a tutorial Website!";
String patternString = ".*tutorial.*";
boolean isMatch = Pattern.matches(patternString, content);
System.out.println("The text contains 'tutorial'? " + isMatch);

As you can see, we searched for the pattern in the provided text using the Pattern class’s matches() function. At the beginning and end of the String “tutorial,” the pattern.*tutorial.* enables zero or more characters (the expression.* is used for zero and more characters).

Limitations: Using this method, we can only search for one instance of a pattern inside a text. You should use the Pattern to match multiple instances. method compile() (discussed in the next section).

2) Pattern.compile()

To do a case-insensitive search or to search for numerous occurrences, you may need to first compile the pattern using Pattern.compile() before searching it in text. For example, in the example above, we searched for the word “tutorial” in the text. This is how this approach may be used in this situation.

String content = "This is a tutorial Website!";
String patternString = ".*tuToRiAl.";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);

CASE INSENSITIVE There are numerous more flags that may be used for various reasons in case-insensitive search. Refer to this paper to learn more about such flags.

We have a Pattern instance now, but how do we match it? We can get a Matcher object using the Pattern.matcher() function, which is what we would require for that. Discuss it now.

3) Pattern.matcher() method

In the section above, we learned how to use the compile() function to get a Pattern object. Here, we’ll go through how to use the matcher() function to get a Matcher instance from a Pattern object.

String content = "This is a tutorial Website!";
String patternString = ".*tuToRiAl.*";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(content);
boolean isMatched = matcher.matches();
System.out.println("Is it a Match?" + isMatched);

Output:

Is it a Match?true

4) Pattern.split()

Use the Pattern.split() function to divide a text into numerous strings depending on a delimiter (in this case, the delimiter would be given using regex). This is a possible method.

import java.util.regex.*;  
class RegexExample2{  
public static void main(String args[]){  
	String text = "ThisIsChaitanya.ItISMyWebsite";
    // Pattern for delimiter
	String patternString = "is";
	Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
	String[] myStrings = pattern.split(text);
	for(String temp: myStrings){
	    System.out.println(temp);
	}
	System.out.println("Number of split strings: "+myStrings.length);
}}

Output:

Th

Chaitanya.It
MyWebsite
Number of split strings: 4

The second split String is null in the output.

java.util.regex.Matcher Class

We previously touched on the Matcher class briefly above. Let’s review a few facts:

Creating a Matcher instance

String content = "Some text";
String patternString = ".*somestring.*";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(content);

Main methods

matches() compares the whole text given to the Pattern.matcher() function when building a Matcher object to the regular expression.

...
Matcher matcher = pattern.matcher(content);
boolean isMatch = matcher.matches();

lookingAt() is similar to the matches() function, with the exception that matches() searches the whole text, but lookingAt() simply matches the regular expression against the beginning of the text.

find() searches the text for instances of regular expressions. mostly used while looking for several occurrences.

Both start() and finish() are often used in conjunction with the find() technique. When utilizing the find() method, they are utilized to get the start and end indexes of a match.

Lets take an example to find out the multiple occurrences using Matcher methods:

package beginnersbook.com;
import java.util.regex.*;  
class RegexExampleMatcher{  
public static void main(String args[]){  
  String content = "ZZZ AA PP AA QQQ AAA ZZ";

  String string = "AA";
  Pattern pattern = Pattern.compile(string);
  Matcher matcher = pattern.matcher(content);

  while(matcher.find()) {
     System.out.println("Found at: "+ matcher.start()
    		+ 
    		" - " + matcher.end());
  }
}
}

Output:

Found at: 4 - 6
Found at: 10 - 12
Found at: 17 - 19

Now that we are acquainted with the Pattern and Matcher classes, we can match a regular expression to the text. Let’s look at the alternatives available to us for defining a regular expression:

1) String Literals

If you just want to look for a certain string in the text, like “abc,” you may write the code like this: Text and regex are same in this case.

Pattern.matches("abc", "abc")

2) Character Classes

A character class compares a single character from the input text to a list of permitted characters. For instance, [Cc]haitanya would match any instance of the String “chaitanya” with a C, whether it be lowercase or uppercase. Several more instances
As there are no p, q, or r in the text, Pattern.matches(“[pqr]”, “r”) would return false. Return true if r is discovered.
“[pqr]”, “pq” in Pattern.matches; As only one of these may be in the text, not both, return false.

The full set of character class constructions is shown below:
[abc]: If one of these (a, b, or c) appears just once in the text, it will match that text.
[^abc]: any single character ( denotes negation) other than a, b, or c
[a-zA-Z]: from a to z, or from A to Z, inclusive (range)
[a-d[m-p]]: a through d, or [a-dm-p]: m through p (union)
Any of them [[a-z&&[def]] (d, e, or f)
A through Z, except b and c, are [a-z&&[bc]]: [ad-z] (subtraction)
[a-z&&[m-p]]: [a-lq-z] rather than [a-z&&[m-p]]: (subtraction)

Predefined Character Classes – Metacharacters

These are comparable to short codes that may be used while composing regex.

Construct	Description
.   ->	Any character (may or may not match line terminators)
\d  ->	A digit: [0-9]
\D  ->	A non-digit: [^0-9]
\s  ->	A whitespace character: [ \t\n\x0B\f\r]
\S  ->	A non-whitespace character: [^\s]
\w  ->	A word character: [a-zA-Z_0-9]
\W  ->	A non-word character: [^\w]

For e.g.
Pattern.matches("\\d", "1"); would return true
Pattern.matches("\\D", "z"); return true
Pattern.matches(".p", "qp"); return true, dot(.) represent any character

Boundary Matchers

^	Matches the beginning of a line.
$	Matches then end of a line.
\b	Matches a word boundary.
\B	Matches a non-word boundary.
\A	Matches the beginning of the input text.
\G	Matches the end of the previous match
\Z	Matches the end of the input text except the final terminator if any.
\z	Matches the end of the input text.

For e.g.
Pattern.matches("^Hello$", "Hello"): return true, Begins and ends with Hello
Pattern.matches("^Hello$", "Namaste! Hello"): return false, does not begin with Hello
Pattern.matches("^Hello$", "Hello Namaste!"): return false, Does not end with Hello

Quantifiers

Greedy	Reluctant	Possessive	Matches
X?	X??	X?+	Matches X once, or not at all (0 or 1 time).
X*	X*?	X*+	Matches X zero or more times.
X+	X+?	X++	Matches X one or more times.
X{n}	X{n}?	X{n}+	Matches X exactly n times.
X{n,}	X{n,}?	X{n,}+	Matches X at least n times.
X{n, m)	X{n, m)? X{n, m)+	Matches X at least n time, but at most m times.

Few examples

import java.util.regex.*;  
class RegexExample{  
public static void main(String args[]){  
   // It would return true if string matches exactly "tom"
   System.out.println(
     Pattern.matches("tom", "Tom")); //False
	
   /* returns true if the string matches exactly 
    * "tom" or "Tom"
    */
   System.out.println(
     Pattern.matches("[Tt]om", "Tom")); //True
   System.out.println(
     Pattern.matches("[Tt]om", "Tom")); //True
	
   /* Returns true if the string matches exactly "tim" 
    * or "Tim" or "jin" or "Jin"
    */
   System.out.println(
     Pattern.matches("[tT]im|[jJ]in", "Tim"));//True
   System.out.println(
     Pattern.matches("[tT]im|[jJ]in", "jin"));//True
	
   /* returns true if the string contains "abc" at 
    * any place
    */
   System.out.println(
     Pattern.matches(".*abc.*", "deabcpq"));//True
	
   /* returns true if the string does not have a 
    * number at the beginning
    */
   System.out.println(
     Pattern.matches("^[^\\d].*", "123abc")); //False
   System.out.println(
     Pattern.matches("^[^\\d].*", "abc123")); //True
	
   // returns true if the string contains of three letters
   System.out.println(
     Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aPz"));//True
   System.out.println(
     Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aAA"));//True
   System.out.println(
     Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "apZx"));//False
	
   // returns true if the string contains 0 or more non-digits
   System.out.println(
     Pattern.matches("\\D*", "abcde")); //True
   System.out.println(
     Pattern.matches("\\D*", "abcde123")); //False
	
   /* Boundary Matchers example
    * ^ denotes start of the line
    * $ denotes end of the line
    */
   System.out.println(
     Pattern.matches("^This$", "This is Chaitanya")); //False
   System.out.println(
     Pattern.matches("^This$", "This")); //True
   System.out.println(
     Pattern.matches("^This$", "Is This Chaitanya")); //False
}
}

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button