binary-search
::: info Prerequisite
Before reading this article, you should first learn:
:::
First, a small joke:
One day, A-Dong borrowed N books from the library. When he was leaving, the alarm went off. The security guard stopped him to check which book was not registered. A-Dong was about to scan each book through the gate one by one to find the problem book, but the guard looked at him with disdain: "You don't even know binary search?"
So the guard split the books into two piles. He let the first pile pass the gate. The alarm sounded, which meant the problem book was in this pile. Then he split this pile into two piles again and let the first pile pass the gate. The alarm sounded again, and he kept splitting into two piles...
After logN checks, the guard finally found the book that triggered the alarm. He looked proud and mocking. Then A-Dong walked away with all the remaining books.
From then on, the library lost N - 1 books.
Binary search is not simple. Knuth (the one who invented the KMP algorithm) said about binary search: the idea is simple, the devil is in the details. Many people like to talk about the integer overflow bug, but the real traps in binary search are not that small detail. The real questions are: should you add one or subtract one from mid? Should the while condition be <= or <?
If you do not really understand these details, writing binary search will feel like magic. You will rely on luck to avoid bugs, and only the writer knows how unreliable it is.
This article will explore the three most common binary search cases: find a number, find the left boundary, and find the right boundary. These three cases use one unified framework: the while condition is always <=, and the boundary updates always use mid ± 1, which is easy to remember.
Binary Search Framework
There are two main styles of binary search:
both ends closed,
[left, right]left closed, right open,
[left, right)
This article uses the “both ends closed” style because it is easier to remember and unify. The left-closed-right-open style is explained in another article: Binary Search: Left-Closed-Right-Open Style.
No matter which style you use, the code follows this general framework:
The parts marked with ... are where details can go wrong. When you read any binary search code, first look at these places. Later we will see how they change in different cases.
One trick to write correct binary search is: do not use plain else. Instead, write all branches as else if, so every case is clear. After you fully understand all the details, you can simplify the code yourself.
Another point: when computing mid, you must avoid overflow. In the code, left + (right - left) / 2 is the same as (left + right) / 2 in result, but it prevents overflow when left and right are very large.
The Idea of Search Interval
To really understand binary search, you must understand the idea of the “search interval” — what left and right mean.
For example, if we set right = nums.length - 1, the index of the last element, then the search interval is a closed interval [left, right]. This is exactly the range we search each time.
If we set right = nums.length, which is last index + 1, then the search interval is left-closed-right-open [left, right).
The essence of binary search is to use a while loop to move left and right, shrinking the search interval again and again, until we lock onto the index of the target value.
Different open/closed choices give different code. For a correct binary search, we must make sure:
When the search interval is empty, the search must stop. Otherwise, the algorithm will run forever.
During the search, we must not skip any element. Otherwise, the answer can be wrong.
Keep these two rules in mind. Next, we will use the both-ends-closed style to handle three cases: find a number, find the left boundary, and find the right boundary.
Find a Number
This is the simplest case: search for a number. If it exists, return its index; otherwise, return -1.
This code solves LeetCode 704 “Binary Search”. Let’s look at the details.
Why is the while condition <= and not <?
<= and not <?For a closed search interval, <= will not miss elements.
The loop while (left <= right) stops when left == right + 1. The search interval is then [right + 1, right]. There is no index that is both >= right + 1 and <= right, so the interval is empty. Stopping here is correct.
If we use while (left < right), the loop stops when left == right. The interval is [left, left], which still has one element, but the while loop has stopped. That element is skipped. If the target is that element, the algorithm will wrongly think the target does not exist.
This shows why binary search is hard to write perfectly: bugs are not always visible. Most test cases might pass, and only some special cases will fail.
You must really understand the search interval to write fully correct code.
Why left = mid + 1 and right = mid - 1?
left = mid + 1 and right = mid - 1?Because mid has already been checked, so we must remove it from the search interval.
Our interval is closed: [left, right]. When we find that index mid is not the target, the next search interval must be either [left, mid - 1] or [mid + 1, right].
What limitation does this algorithm have?
For example, in a sorted array nums = [1,2,2,2,3], with target = 2, this algorithm returns index 2. That is fine. But what if we want the left boundary (index 1), or the right boundary (index 3)? This algorithm cannot do that.
You might say: once we find one target, we can linearly scan left or right. That works, but then we lose the logarithmic time advantage of binary search.
The next algorithms will handle these two “boundary search” cases, which are very common in real problems.
Find the Left Boundary
First, look at the code for finding the left boundary:
Why does this algorithm find the left boundary?
The key is how we handle nums[mid] == target:
We do not return as soon as we find target. Instead, we shrink the right boundary with right = mid - 1 and keep searching in [left, mid - 1]. This keeps moving the interval to the left, so we can lock onto the leftmost position of target.
What if target does not exist?
target does not exist?If target does not exist, left_bound returns the index of the smallest element that is greater than target.
You do not need to memorize this. Just see an example: nums = [2,3,5,7], target = 4. The return value is 2, because the element 5 is the smallest element greater than 4.
If you want to return -1 when target does not exist, you can check nums[left] at the end:
Find the Right Boundary
Now look at the code for finding the right boundary:
Why does this algorithm find the right boundary?
Again, the key is how we handle nums[mid] == target:
We do not return as soon as we find target. Instead, we shrink the left boundary with left = mid + 1 and keep searching in [mid + 1, right]. This keeps moving the interval to the right, so we can lock onto the rightmost position of target.
What if target does not exist?
target does not exist?If target does not exist, right_bound returns the index of the largest element that is less than target.
For example, nums = [2,3,5,7], target = 4. The return value is 1, because the element 3 is the largest element smaller than 4.
If you want to return -1 when target does not exist, you can check nums[right] at the end:
Unified Template
All three cases use the same closed search interval [left, right]. The only difference is what we do when nums[mid] == target:
From this article, you learned:
When you read or write binary search code, avoid plain
else. Use onlyifandelse ifso that all cases are clear.Always think about the “search interval” and the stopping condition of the while loop. Stop when the search interval is empty.
Use the same closed-interval style for all three cases, and only change the
nums[mid] == targetbranch and the return logic.
In the end, this binary search framework is the “technique” level. On the “idea” level, the core of binary search thinking is: use the known information to shrink (halve) the search space as much as possible, so brute-force search becomes much faster.
If you understand this article, you can write correct binary search code. The closed-interval style here is my recommended style: one unified framework for all three cases, easy to remember.
You may also see another style on the internet: left-closed-right-open [left, right), using while (left < right) and right = mid. That style is also common. If you want to deeply understand it, see Binary Search: Left-Closed-Right-Open Style.
In real problems you are rarely asked to just “write binary search code”. I will further explain how to use binary search thinking to improve brute-force solutions in Applications of Binary Search and More Binary Search Exercises.
Last updated