Prefix Sum Skill
Last updated
Last updated
Translator: youyun
Author: labuladong
Let's talk about a simple but interesting algorithm problem today. Find the number of subarrays which sums to k.
The most intuitive way is using brute force - find all the subarrays, sum up and compare with k.
The tricky part is, how to find the sum of a subarray fast? For example, you're given an array nums
, and asked to implement API sum(i, j)
which returns the sum of nums[i..j]
. Furthermore, the API will be very frequently used. How do you plan to implement this API?
Due to the high frequency, it is very inefficient to traverse through nums[i..j]
each time. Is there a quick method which find the sum in time complexity of O(1)? There is a technique called Prefix Sum.
The idea of Prefix SUm goes like this: for a given array nums
, create another array to store the sum of prefix for pre-processing:
The meaning of preSum
is easy to understand. preSum[i]
is the sum of nums[0..i-1]
. If we want to calculate the sum of nums[i..j]
, we just need to perform preSum[j+1] - preSum[i]
instead of traversing the whole subarray.
Coming back to the original problem. If we want to find the number of subarrays which sums to k respectively, it's straightforward to implement using Prefix Sum technique:
The time complexity of this solution is O(N^2), while the space complexity is O(N). This is not the optimal solution yet. However, we can apply some cool techniques to reduce the time complexity further, after understanding how Prefix Sum and arrays can work together through this solution.
The solution in part 1 has nested for
loop:
What does the inner for
loop actually do? Well, it is used to calculate how many j
can make the difference of sum[i]
and sum[j]
to be k. Whenever we find such j
, we'll increment the result by 1.
We can reorganize the condition of if
clause:
The idea of optimization is, to record down how many sum[j]
equal to sum[i] - k
such that we can update the result directly instead of having inner loop. We can utilize hash table to record both prefix sums and the frequency of each prefix sum.
In the following case, we just need prefix sum of 8 to find subarrays with sum of k. By brute force solution in part 1, we need to traverse arrays to find how many 8 there are. Using the optimal solution, we can directly get the answer through hash table.
This is the optimal solution with time complexity of O(N).
Prefix Sum is not hard, yet very useful, especially in dealing with differences of array intervals.
For example, if we were asked to calculate the percentage of each score interval among all students in the class, we can apply Prefix Sum technique:
Afterwards, for any given score interval, we can find how many students fall in this interval by calculating the difference of prefix sums quickly. Hence, the percentage will be calculated easily.
However, for more complex problems, simple Prefix Sum technique is not enough. Even the original question we discussed in this article requires one step further to optimize. We used hash table to eliminate an unnecessary loop. We can see that if we want to achieve the optimal solution, it is indeed important to understand a problem thoroughly and analyze into details.