algo-en
  • Introduction
  • I. Dynamic Programming
    • Dynamic Programming in Details
    • Classic DP: Edit Distance
    • Classic DP: Super Egg
    • Classic DP: Super Egg(Advanced Solution)
    • Class DP: Longest Common Subsequence
    • Classis DP: Game Problems
    • Regular Expression
    • The Strategies of Subsequence Problem
    • Greedy: Interval Scheduling
    • 4 Keys Keyboard
    • What is DP Optimal Substructure
    • Longest Increasing Subsequence
    • KMP Algorithm In Detail
    • House Robber Problems
    • Stock Buy and Sell Problems
  • II. Data Structure
    • Binary Heap and Priority Queue
    • LRU Cache Strategy in Detail
    • Collections of Binary Search Operations
    • Special Data Structure: Monotonic Stack
    • Special Data Structure: Monotonic Stack
    • Design Twitter
    • Reverse Part of Linked List via Recursion
    • What's the Best Algo Book
    • Queue Implement Stack/Stack implement Queue
    • Framework for learning data structures and algorithms
  • III. Algorithmic thinking
    • My Way to Learn Algorithm
    • The Framework for Backtracking Algorithm
    • Binary Search in Detail
    • The Tech of Double Pointer
    • The Key Concept of TwoSum Problems
    • Divide Complicated Problem: Implement a Calculator
    • Prefix Sum Skill
    • FloodFill Algorithm in Detail
    • Interval Scheduling: Interval Merging
    • Interval Scheduling: Intersections of Intervals
    • String Multiplication
    • Pancake Soring Algorithm
    • Sliding Window Algorithm
    • Some Useful Bit Manipulations
    • Russian Doll Envelopes Problem
    • Recursion In Detail
    • Backtracking Solve Subset/Permutation/Combination
    • Several counter-intuitive Probability Problems
    • Shuffle Algorithm
  • IV. High Frequency Interview Problem
    • How to Implement LRU Cache
    • How to Find Prime Number Efficiently
    • How to Calculate Minimium Edit Distance
    • How to Solve Drop Water Problem
    • How to Remove Duplicate From Sorted Sequence
    • How to Find Longest Palindromic Substring
    • How to Reverse Linked List in K Group
    • How to Check the Validation of Parenthesis
    • How to Find Missing Element
    • How to Pick Elements From a Arbitrary Sequence
    • How to use Binary Search
    • How to Scheduling Seats
    • Union-Find Algorithm in Detail
    • Union-Find Application
    • Find Subsequence With Binary Search
    • Problems can be solved by one line
    • How to Find Duplicate and Missing Element
    • How to Check Palindromic LinkedList
  • V. Common Knowledge
    • Difference Between Process and Thread in Linux
    • You Must Know About Linux Shell
    • You Must Know About Cookie and Session
    • Cryptology Algorithm
    • Some Good Online Pratice Platforms
Powered by GitBook
On this page
  • 1.Define the meaning of the dp array:
  • 2.state transition equation:
  • 3.code implementation
  • 4.summary:

Was this helpful?

  1. I. Dynamic Programming

Classis DP: Game Problems

PreviousClass DP: Longest Common SubsequenceNextRegular Expression

Last updated 4 years ago

Was this helpful?

Translator:

Author:

In the last article,we discussed a fun「stone game 」in ,By the constraints of the problem, the game is first to win.But intelligence questions are always intelligence questions,Real algorithmic problems are not solved by cutting corners. So this paper is going to talk about the stone game and assuming that both of these guys are smart enough, who's going to win in the end how do you solve this problem with dynamic programming.

Game problems follow a similar pattern,The core idea is to use tuples to store the game results of two people on the basis of two-dimensional dp array.Once you're mastered this technique,if someone asks you a similar question again,you can take it in stride.

We changed the stone game to be more general:

There is a pile of stones in front of you and your friends,it's represented by an array of piles,and piles[i] is how many stones are there in the ith heap.You take turns with the stones,one pile at a time,but you can only take the left or the right piles.After all the stones have been taken away, the last one who has more stones wins.

The heap number of stones can be any positive integer,and the total number of stones can be any positive integer,That would break the situation in which one must win first.Let's say I have three piles of rocks: piles = [1, 100, 3],Whether it's a 1or a 3,the 100 that's going to make the difference is going to be taken away by the back hand,and the back hand is going to win.

Assuming they are both smart,design an algorithm that returns the difference between the final score of the first hand and the last hand,As in the example above,the first hand gets 4 points,the second hand gets 100 points, and you should return -96.

With this design,this problem is a Hard dynamic programming problem.The difficuty with gaming is that two people have to take turns choosing,and they're both smart.How do we program?

It's the approach that's been emphasized many times,The first step is to define the array,and then,like the stock buying and selling series,once you find the「status」and the「selection」,and then it's easy.

1.Define the meaning of the dp array:

Defining what a dp array means is very tachnical,The dp array of the same problem can be defined in several ways.Different definitions lead to different state transition equations,But as long as there's no logic problem,you end up with the same answer.I recommend that you don't get caught up in what looks like a great short technique,and that you end up with something that's a little bit more stable, something that's the most interpretable, and something that's the easiest to generalize,This paper gives a general design framework of game problem.

Before we introduce what a dp array means,let's take a look at what it ultimately looks like:

As explained below,tuples are considered to be a object containing first and second atrributes,And to save space,these two atrributes are abbreviated to fir and sec. As shown in the figure above,we think dp[1][3].fir = 10,dp[0][1].sec = 3.

Start by answering a few questions that readers might ask:

This is a two-dimensional dp table that stores tuples.How do you represent that?Half of this array is useless,How do you optimize it?Very simple, do not care,first to think out the way to solve the problem again.

Here's an explanation of what a dp array means:

dp[i][j].fir represents the highest score the first hand can get for this section of the pile piles[i...j]
dp[i][j].sec represents the highest score the back hand can get for this section of the pile piles[i...j]

Just to give you an example,Assuming piles = [3, 9, 1, 2],The inedx starts at 0
dp[0][1].fir = 9 means:Facing the pile of stones [3, 9],The first player eventually gets 9 points.
dp[1][3].sec = 2 means:Facing the pile of stones [9, 1, 2],The second player eventually gets 2 points.

The answer we want is the difference between the final score of the first hand and the final score of the second hand,By thisdefinition, that is $dp[0][n-1].fir - dp[0][n-1].sec$ That is,facing the whole piles,the difference between the best score of the first hand and the best score of the second hand.

2.state transition equation:

It's easy to write the transition equation,The first step is to find all the states and the choices you can make for each state,and then pick the best.

From the previous definition of the dp array,there are obviously three states:the starting index i,the ending index j,and the person whose turn it is.

dp[i][j][fir or sec]
range:
0 <= i < piles.length
i <= j < piles.length

For each state of the problem,there are two choices you can make :Choose the pile to the left,or the pile to the right.We can do all the states like this :

n = piles.length
for 0 <= i < n:
    for j <= i < n:
        for who in {fir, sec}:
            dp[i][j][who] = max(left, right)

The pseudocode above is a rough framework for dynamic programming,and there is a similar pseudocode in the stock series problem.The difficulty of this problem is that two people choose alternately,that is to say,the choice of the first hand has effect on the second hand,how can we express this?

According to our definition of dp array,it is easy to solve this difficulty and write down the state transition equation:

dp[i][j].fir = max(piles[i] + dp[i+1][j].sec, piles[j] + dp[i][j-1].sec)
dp[i][j].fir = max(    Select the rock pile on the far left     ,     Select the rock pile on the far right     )
# explanation:I,as a first hand,faced piles[i...j],I had two choices:
# If I choose the pile of rocks on the far left,and  I will face piles[i+1...j]
# But when it came to the other side,I became the back hand.
# If I choose the pile of rocks on the far right,and  I will face piles[i...j-1]
# But when it came to the other side,I became the back hand.

if the first hand select the left:
    dp[i][j].sec = dp[i+1][j].fir
if the first hand select the right:
    dp[i][j].sec = dp[i][j-1].fir
# explanation:I,as a back hand ,have to wait for the first hand to choose,There are two condition:
#  If the first hand choose the pile of rocks on the far left,I will face piles[i+1...j]
# then it's my turn, and i become the first hand.
# If the first hand choose the pile of rocks on the far right,I will face piles[i...j-1]
# then it's my turn, and i become the first hand.

According to the definition of the dp array, we can also find the base case,which is the simplest case:

dp[i][j].fir = piles[i]
dp[i][j].sec = 0
range: 0 <= i == j < n
# explanation:i==j which means just a bunch of rocks piles[i] in the front of us
# So obviously the first hand can get piles[i],
# there are no stones int the back,so his score is 0

One thing to note here is that we found that the base case is tilted in the table,and we need dp[i+1][j] and dp[i][j-1] to compute dp[i][j]:

So the algorithm can not simply traverse the dp array row by row,but traverse the array diagonally.

To be honest,traversing a two-dimensional array diagonally is easier said than done.

3.code implementation

How do you implement this fir and sec tuple?You can either use python,with its own tuple type,or use the c++pair container,or use a three-dimensional array,the last dimension being the tuple,or we can write a pair class ourselves.

class Pair {
    int fir, sec;
    Pair(int fir, int sec) {
        this.fir = fir;
        this.sec = sec;
    }
}

Then we can directly translate our state transition equation into code,and we can pay attention to technique of traversing through array diagonally:

/* Returns the difference between the last first hand and last hand */
int stoneGame(int[] piles) {
    int n = piles.length;
    //Initializes the dp array
    Pair[][] dp = new Pair[n][n];
    for (int i = 0; i < n; i++) 
        for (int j = i; j < n; j++)
            dp[i][j] = new Pair(0, 0);
    // base case
    for (int i = 0; i < n; i++) {
        dp[i][i].fir = piles[i];
        dp[i][i].sec = 0;
    }
    // traverse the array diagonally
    for (int l = 2; l <= n; l++) {
        for (int i = 0; i <= n - l; i++) {
            int j = l + i - 1;
            // The first hand select the left- or right-most pile.
            int left = piles[i] + dp[i+1][j].sec;
            int right = piles[j] + dp[i][j-1].sec;
            // Refer to the state transition equation.
            if (left > right) {
                dp[i][j].fir = left;
                dp[i][j].sec = dp[i+1][j].fir;
            } else {
                dp[i][j].fir = right;
                dp[i][j].sec = dp[i][j-1].fir;
            }
        }
    }
    Pair res = dp[0][n-1];
    return res.fir - res.sec;
}

Dynamic programming ,the most important is to understand the state transition equation,based on the previous detailed explanation,the reader should be able to clearly understand the meaning of this large piece of code.

And notice that the calculation of 'dp[i][j]' only depends on the left and the bottom elements,so there must be room for optimization, for one-dimensional dp,But one-dimensional dp is a little bit more complicated,it's less interpretable,so you don't have to waste time trying to understand it.

4.summary:

This paper presents a dynamic programming method to solve the game problem. The premise of game problems is usually between two smart people. The common way to describe such games is a one-dimensional array of dp, in which tuples represent the optimal decision of two people.

The reason for this design is that when the first hand makes a choice, it becomes the second hand, and when the second hand makes a choice, it becomes the first hand. This role reversal allows us to reuse the previous results, typical dynamic programming flags.

Those of you who have read this should understand how algorithms solve game problems. Learning algorithms, must pay attention to the template framework of the algorithm,rather than some seemingly awesome ideas, do not bend to write an optimal solution.Don't be afraid to use more space,don't try optimization too early, and don't be afraid of multidimensional arrays.A dp array is a way to store information and avoid double counting.

I hope this article has been helpful.

wadegrc
labuladong
several puzzles
1
2
3
4