> For the complete documentation index, see [llms.txt](https://labuladong.gitbook.io/algo-en/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://labuladong.gitbook.io/algo-en/dynamic-programming/regular-expression.md).

# regular-expression

::: info Prerequisite Knowledge

Before reading this article, you need to study:

* [Core Framework of Dynamic Programming](https://labuladong.online/en/algo/essential-technique/dynamic-programming-framework/)
* [Classic Dynamic Programming: Edit Distance](https://labuladong.online/en/algo/dynamic-programming/edit-distance/)

:::

Regular expressions are a very powerful tool. In this article, we will look at the basic idea behind regular expressions. LeetCode problem 10: [Regular Expression Matching](https://leetcode.com/problems/regular-expression-matching/) asks us to implement a simple regular matching algorithm, including the "." wildcard and the "\*" wildcard.

These two wildcards are used most often. The dot "." can match any single character. The star "\*" means the character before it can repeat any number of times (including 0 times).

For example, the pattern `".a*b"` can match the text `"zaaab"` or `"cb"`. The pattern `"a..b"` matches `"amnb"`. The pattern `".*"` can match any text.

The problem gives us two strings `s` and `p`. `s` is the text, and `p` is the pattern. You need to check if the pattern `p` can match the text `s`. You can assume the pattern only contains lowercase letters and the two wildcards above. The pattern will always be valid; there won't be cases like `*a` or `b**`.

The function signature is as follows:

```java
boolean isMatch(string s, string p);
```

What is the hard part of this regular expression we need to implement?

The dot wildcard is easy. Any character in `s` can match a `.` in the pattern. The tricky part is the star wildcard. When you see "\*", the character before it can be used any number of times: repeated once, many times, or not at all. What should we do?

The answer is simple: try all possible cases. If any case matches, then `p` matches `s`. Whenever we need to try all cases for two strings, we should think about using dynamic programming.

### 1. Idea Analysis

Let’s think about how `s` and `p` match. We use two pointers `i` and `j` to move along `s` and `p`. If both pointers reach the end, it means the match is successful. Otherwise, the match fails.

**If we ignore the "\*" wildcard, when matching characters `s[i]` and `p[j]`, all we need to do is check if they match:**

```java
boolean isMatch(String s, String p) {
    int i = 0, j = 0;
    while (i < s.length() && j < p.length()) {
        // the '.' wildcard is a versatile match
        if (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.') {
            // match, continue to match s[i+1..] and p[j+1..]
            i++; j++;
        } else {
            // not a match
            return false;
        }
    }
    return i == j;
}
```

Now, if we add the "\*" wildcard, things get a bit more complex, but we just need to consider each case separately.

**When `p[j + 1]` is "\*", let’s look at the cases:**

1. If `s[i] == p[j]`, there are two possibilities:

1.1 `p[j]` might match multiple characters. For example, `s = "aaa", p = "a*"`, then `p[0]` matches all three `"a"`.

1.2 `p[j]` might match 0 characters. For example, `s = "aa", p = "a*aa"`. Since the rest of the pattern can match `s`, `p[0]` can match 0 times.

2. If `s[i] != p[j]`, there is only one case:

`p[j]` can only match 0 times, and we see if the next part can match `s[i]`. For example, `s = "aa", p = "b*aa"`, `p[0]` can only match 0 times.

So, we can update the code for the "\*" wildcard as follows:

```java
if (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.') {
    // match
    if (j < p.length() - 1 && p.charAt(j + 1) == '*') {
        // there is a * wildcard, which can match 0 or more times
    } else {
        // no * wildcard, match exactly once
        i++; j++;
    }
} else {
    // do not match
    if (j < p.length() - 1 && p.charAt(j + 1) == '*') {
        // there is a * wildcard, can only match 0 times
    } else {
        // no * wildcard, the match cannot proceed
        return false;
    }
}
```

The idea is clear. But when we see the "\*" wildcard, should we match 0 times or more? How many times?

This is a "choice" problem. We need to try all possible choices to get the answer. The core of dynamic programming is "state" and "choice". **The "state" is the position of the two pointers `i` and `j`. The "choice" is how many times `p[j]` matches a character.**

### 2. Dynamic Programming Solution

Based on the "state", we can define a `dp` function:

```java
boolean dp(String s, int i, String p, int j);
```

The definition of the `dp` function is as follows:

**If `dp(s, i, p, j) = true`, it means `s[i..]` can match `p[j..]`. If `dp(s, i, p, j) = false`, it means `s[i..]` cannot match `p[j..]`.**

Based on this definition, the answer we want is the result of `dp` when `i = 0` and `j = 0`. So we use the `dp` function like this:

```java
boolean isMatch(String s, String p) {
    // pointers i and j start moving from index 0
    return dp(s, 0, p, 0);
}
```

We can write the main logic of the `dp` function based on the previous code:

```java
// core logic of the dp function (pseudocode)
// definition: the function returns whether s[i..] can match p[j..]
boolean dp(String s, int i, String p, int j) {
    if (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.') {
        // match
        if (j < p.length() - 1 && p.charAt(j + 1) == '*') {
            // 1.1 wildcard matches 0 times or multiple times
            return dp(s, i, p, j + 2) || dp(s, i + 1, p, j);
        } else {
            // 1.2 regular match 1 time
            return dp(s, i + 1, p, j + 1);
        }
    } else {
        // no match
        if (j < p.length() - 1 && p.charAt(j + 1) == '*') {
            // 2.1 wildcard matches 0 times
            return dp(s, i, p, j + 2);
        } else {
            // 2.2 cannot continue matching
            return false;
        }
    }
}
```

**According to the definition of the `dp` function**, these cases are easy to explain:

**1.1 Wildcard matches 0 or more times**

Add 2 to `j` and keep `i` the same. This means we skip `p[j]` and the wildcard after it, which is the case where the wildcard matches 0 times.

Even if `s[i] == p[j]`, this can still happen, as shown below:

Add 1 to `i` and keep `j` the same. This means `p[j]` matches `s[i]`, but `p[j]` can match more, which is the case where the wildcard matches multiple times:

If either case leads to a match, we return true, so we use "or" to combine both.

**1.2 Normal matching once**

This is the case without a `*` wildcard. If `s[i] == p[j]`, just add 1 to both `i` and `j`:

**2.1 Wildcard matches 0 times**

Similar to case 1.1, add 2 to `j` and keep `i` the same:

**2.2 If there is no `*` wildcard and cannot match, it means match failed**

Looking at the pictures makes it easy to understand. Now, let's think about the base cases of the `dp` function:

**One base case is when `j == p.length()`**. According to the definition of the `dp` function, it means the pattern string `p` has been matched. We need to see if the text string `s` has also been matched. If `s` is also matched, then it's a success:

```java
if (j == p.length()) {
    return i == s.length();
}
```

**Another base case is when `i == s.length()`**. According to the `dp` definition, this means the text string `s` has been completely matched. Can we just check if `p` is also matched?

```java
if (i == s.length()) {
    // Is this enough?
    return j == p.length();
}
```

**This is not correct. At this time, we cannot just check if `j` is equal to `p.length()`. As long as `p[j..]` can match an empty string, it counts as a successful match.** For example, for `s = "a", p = "ab*c*"`, when `i` reaches the end of `s`, `j` hasn't reached the end of `p`, but `p` can still match `s`.

So we can write code like this:

```java
int m = s.length(), n = p.length();

if (i == s.length()) {
    // if it can match an empty string, characters and * must appear in pairs
    if ((n - j) % 2 == 1) {
        return false;
    }
    // check if it is in the form of x*y*z*
    for (; j + 1 < p.length(); j += 2) {
        if (p.charAt(j + 1) != '*') {
            return false;
        }
    }
    return true;
}
```

With this idea, we can write the complete code:

```java
class Solution {
    // memoization
    private int[][] memo;

    public boolean isMatch(String s, String p) {
        int m = s.length(), n = p.length();
        memo = new int[m][n];
        for (int[] row : memo) {
            Arrays.fill(row, -1);
        }
        // pointers i and j start moving from index 0
        return dp(s, 0, p, 0);
    }

    // calculate whether p[j..] matches s[i..]
    private boolean dp(String s, int i, String p, int j) {
        int m = s.length(), n = p.length();
        // base case
        if (j == n) {
            return i == m;
        }
        if (i == m) {
            if ((n - j) % 2 == 1) {
                return false;
            }
            for (; j + 1 < n; j += 2) {
                if (p.charAt(j + 1) != '*') {
                    return false;
                }
            }
            return true;
        }

        // check memo to avoid redundant calculations
        if (memo[i][j] != -1) {
            return memo[i][j] == 1;
        }

        boolean res = false;

        if (s.charAt(i) == p.charAt(j) || p.charAt(j) == '.') {
            if (j < n - 1 && p.charAt(j + 1) == '*') {
                res = dp(s, i, p, j + 2)
                        || dp(s, i + 1, p, j);
            } else {
                res = dp(s, i + 1, p, j + 1);
            }
        } else {
            if (j < n - 1 && p.charAt(j + 1) == '*') {
                res = dp(s, i, p, j + 2);
            } else {
                res = false;
            }
        }
        // store the current result in the memo
        memo[i][j] = res ? 1 : 0;
        return res;
    }
}
```

The code uses a hash table called `memo` to remove overlapping subproblems. How to spot overlapping subproblems? You can check [Dynamic Programming Q\&A](https://labuladong.online/en/algo/dynamic-programming/faq-summary/) for techniques. Let's look at the recursive framework of the regex algorithm:

```java
boolean dp(String s, int i, String p, int j) {
    dp(s, i, p, j + 2);     // 1
    dp(s, i + 1, p, j);     // 2
    dp(s, i + 1, p, j + 1); // 3
    dp(s, i, p, j + 2);     // 4
}
```

If you want to get from `dp(s, i, p, j)` to `dp(s, i+2, p, j+2)`, there are at least two paths: `1 -> 2 -> 2` and `3 -> 3`. This means the state `(i+2, j+2)` will be calculated more than once. This shows there are overlapping subproblems, so we need a memo to remove them and make the code more efficient.

The time complexity of dynamic programming is "number of states" \* "time per recursion". In this problem, the number of states is the combinations of `i` and `j`, which is `M * N` (`M` is the length of `s`, `N` is the length of `p`). The `dp` function has no loop (except for the base case, which doesn't happen often), so each recursion takes constant time. Multiply them, and the total time complexity is $O(MN)$.

The space complexity is just the size of the memo, which is $O(MN)$.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://labuladong.gitbook.io/algo-en/dynamic-programming/regular-expression.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
