We all know this classic problem, “there is some hidden number and you have to interactively guess it.”, which could be solved using binary search when we know that maximum number that we can guess.

But what if the interactor may lie to us in one case? For example, the number is $3$ and the maximum number we can guess is $10$, but when we ask whether it is *greater than* $5$ and it replies *yes* and for the rest of queries it answers correctly. A simple binary search would fail.

How to proceed in this case? What is the minimum number of query in the worst case?

A generalization of this class of problems is widely studied. See, e.g., this paper for a survey.

In your particular case, the problem can be easily solved without any asymptotic change in the computational complexity. Run the binary search three times. At least two of the three results must be equal to the hidden number. Return the majority result.

There are other elegant ways to handle up to $k$ of lies by only using $O(log n + k)$ time (where $k$ might be a function of $n$).

If normal binary search would take k questions, then you can solve this with 2k+1 questions: Ask each question twice. If you get the same answer, it was the truth. If not, a third question reveals the truth, this happens only once.

I suspect you can do better. If the number is from 1 to 100 and I check the numbers 40 and 60 then knowing that *one* answer is true will give me *some* information.

More difficult if he can lie once, but can repeat the lie (so asking the same question ten times reveals nothing). For example if the number is from 1 to 100, and the liar insists that it is equal to 87 *and* equal to 88 (one is true, one is a repeated lie) you have no chance finding out what the number is.

If it was wrong one time, but right all of the other times **after** it, then we notice it will return you always the same thing after that wrong decision, **and** it would be the negation of them. In this example, it would mean that one time it said “above” and was wrong, but all of the other times after it, it would always answer “below”.

Thus, you can just trace this back to the last time it “changed it’s mind” and continue normal binary search

The key to this question is understanding that it is impossible to find out what the unknown number is *and that is all*. All you can find out is *both* what the unknown number is *and* which question was answered with a lie.

To take a concrete example, suppose that the number is between 1 and 10, and suppose that we **know** that one answer is a lie. We ask six questions. At the end, we will have distinguished between $10$ possibilities (which number it is) and also between $6$ possibilities (which question was answered falsely). $10times 6=60$, and since $2^6=64$, six yes-or-no-questions will indeed have identified not only the number but also the false answer.

To be precise, the number $k$ of questions needed to distinguish between $n$ numbers, where one answer is definitely a lie, satisfies $$kgeqlog_2n+log_2k$$

On the other hand, if there only *might* be a lying answer, then we have to allow for the “no lie” possibility, and the criterion is $$kgeqlog_2n+log_2(k+1)$$

Can we achieve this bound, or do real algorithms need more? This depends on whether or not you are expected to ask all the questions before you receive any answers. If you aren’t, then life is a little easier, and here is a sketch of how you would proceed in the case of $n=10$ and $k=6$.

**Question A** *Is the number 1, 2, 3, 4, or 5?*

If the answer is Yes, then the possibilities are 1B, 1C, 1D, 1E, 1F, 2B, 2C, 2D, 2E, 2F, 3B, 3C, 3D, 3E, 3F, 4B, 4C, 4D, 4E, 4F, 5B, 5C, 5D, 5E, 5F, 6A, 7A, 8A, 9A and 10A, where the number denotes the value of the unknown number and the letter denotes which answer was a lie. There are 30 possibilities, so we have done a perfect binary chop on the original space of 60.

Supposing that the answer was Yes:

**Question B** *Is the number 1, 2, 3 or 10?*

If the answer is Yes, then the possibilities are 1C, 1D, 1E, 1F, 2C, 2D, 2E, 2F, 3C, 3D, 3E, 3F, 4B, 5B, or 10A. There are 15 possibilities, so we have done a perfect binary chop on the previous space of 30.

If the answer to question A had been No, then question B would have been different (for instance, “1, 8, 9 or 10?”).

Supposing that the answer to question B was also Yes.

**Question C** *Is the number 1 or 2?*

If the answer is Yes, the possibilities are 1D, 1E, 1F, 2D, 2E, 2F, and 3C. That makes 7 for a Yes or 8 for a No, which is the closest to a bisection that we can get. Question D will therefore have to distinguish between either 7 or 8 possibilities.

The rest of this process is left as an exercise for the reader.

Note that the choice of each question does depend on the answers to the previous questions. If this is **not** what you want, then the field of error-correcting codes comes into its own, since an error-correcting code effectively transmits a whole batch of answers to a set predetermined questions, and the receiver’s job is to deduce the value on the basis of those answers.

In this answer it’s assumed there’s at most 1 lie in the sense that if you ask the same question twice and get the same answer twice, you know for sure that it’s not a lie.

Using the observation in Nir Shahar’s answer, an algorithm can be constructed that performs at most $lceillog_2{n}rceil + 2lceilsqrt{lceillog_2nrceil}rceil + 1$ (or slightly less) comparisons.

The observation is that when a binary search repeatedly made the same decision (say “bigger than” each time) during the last steps up to the current point, the lie can only be right before that, so at the last time it made the other decision (in that case “smaller than”) (or the lie still has to come or the lie was the last decision). More generally while doing a normal binary search, if the lie has come already the lie was the last time the other decision was made than the last one (or the lie was the last decision).

fix a constant $c = lfloor sqrt{lfloor log_2nrfloor}rfloor$

and let the search depth at each point be $d$.

Just perform the normal binary search until c the same decisions are made in a row (since the last time we made this same check). At this point, check if the decision at the last change was correct (that is, at depth $d-c$). If that was not a lie, we know there was no lie up to this point using only 1 extra comparison per $c$ steps. If it was a lie, we found the lie while doing less than about $frac{log_2 n} c + c approx 2 sqrt{log_2 n}$ extra comparisons.

I think this solution is optimal for 1 potential lie, but I’m not sure. A proof that no better solution exists would require a complex reasoning along the lines of https://cs.stackexchange.com/a/51499/28999

A similar problem, named “Black Hole”, appears as one of the problems of 2019 Russian Olympiad of schoolchildren in computer science.

The problem asks for a program that interacts with a jury program simulating probe sensors and determines the radiation level of each black hole. The sensor mounted on the probe can answer the following queries: determine the x value by the value of x, whether it is true that the radiation level is greater than or equal to x. Unfortunately, due to a software error, the sensor response may not be correct. Fortunately, after the first incorrect answer, the sensor of this probe changes its state and gives only correct answers to all subsequent requests.

The following section is the solution to that problem given in the link provided by user my pronoun is monicareinstate. It is translated from Russian into English by Google.

Note that if we were given the same answer twice for the same request, this answer must be correct. Therefore, for subproblem 1 (n ≤ 1000, q ≤ 30), we can perform a regular binary search, repeating each query three times and believing that the answer is repeated twice. For subproblem 2 (n ⩽ 1000, q ⩽ 21), we note that the query should be repeated the third time only if the first two answers were different, and after that the answers to all the queries will be sure to be correct. Thus, the number of requests will be 3⌈log2 n⌉ and 2⌈log2 n⌉ + 1, respectively.

In all other subtasks, it is required to meet the minimum number of requests, sufficient for any strategy for responding to requests for a given n. The first few subtasks (n ⩽ 12 or n ⩽ 25) can be completed by enumerating possible strategies. As possible optimizations, one can set the enumeration state with a multiset of all received answers, and also use the fact that the number of allowed queries is small (no more than 9 for n ⩽ 25).

To obtain a polynomial-time solution, we note the following. Let us be told as answers that the answer belongs to prefixes of length p1 ⩽ p2 ⩽. . . and suffixes of length s1 ⩽ s2 ⩽. . .. Then the answer about the prefix of length p2 could not be wrong, since then the answer about the prefix of length p1 would also be incorrect; similarly, the answer about the suffix of length s2 is also exactly correct. Therefore, the search state can be uniquely defined by the numbers p1, p2, s1, s2. We will calculate the values ans p1, p2, s1, s2 – the required number of queries to guess the number in this state. The number x must belong to the union of the ranges [n − s1 + 1, p2] ∪ [n − s2 + 1, p1]; if the length of this union is 1, then the value is 0. Otherwise, for an arbitrary query? x we go into one of two states whose parameters are easily calculated (we denote these states by L (x) and R (x)); the optimal query x must minimize max (ansL (x), ansR (x). To calculate ans … we use dynamic programming. In this solution, we have O (n4) states, in each of which O (n) transitions are possible, therefore the total difficulty is O (n5). Such a solution gains 30–35 points (in addition to 15 points for subtasks 1 and 2).

Consider several ways to optimize this solution:

• Let the prefix of length p1 and the suffix of length s1 do not intersect. This means that by this moment one of the answers was definitely incorrect, and in the remaining range of possible values you can use the usual binary search. We turn to more convenient notation for the case when this is not true: let b be the length of the intersection [1, p1] ∩ [n – s1 + 1, n], a = p1 – b, c = s1 – b.

• Note that the values of ans p1, p2, s1, s2 and ans p1 + d, p2 + d, s1 − d, s2 − d coincide, and the strategies for these states differ by shifting all requests to d. This allows you to set the state by numbers

p2 – p1, p1 + s1, p2 + s1, and the complexity of the solution with this optimization is O (n4) (35–40 points).

• Note that in any state, the optimal number of requests after the response <to the request? x does not decrease with increasing x; likewise, the number of requests after an answer> = x does not increase. This means that the optimum max (ansL (x), ansR (x)) can be searched by binary search on x, which reduces complexity to O (n3log n) (along with the previous optimization

40–48 points).

• It is easy to show that, for example, with increasing c, the position of the optimal response does not decrease; this allows us to search for the optimal transition amortized over O (1). Together with previous optimizations, we get O (n3) complexity (55-60 points).

• Note that the response value is O (log n). We swap the value of the answer and one of the DP parameters: let maxc k, a, b be equal to the maximum value of c at which in state (a, b, c) it is possible to guess the number for k queries, or −∞ if this cannot be done either which c. Then the following transitions are possible:

- if maxc k − 1, a, b ̸ = −∞, then maxc k, a, b ⩾ maxc k − 1, a, b + 2k − 1 – b. Indeed, in the state (a, b, c + 2k − 1 – b), we make a query leading to the states (a, b, c) and (b, 0, 2k − 1 – b), in the second of them the usual bin search .
- if a ⩾ 2k − 1 – b, then, by a similar argument, maxck, a, b ⩾ maxc k − 1, a− (2k − 1 − b), b.
- let us make a query that breaks the middle part of length b into parts of length x and b – x, then the transitions are made to the states (a, x, b − x) and (x, b − x, c). Then if satisfied maxc k − 1, a, x ⩾ b – x, then maxc k, a, b ⩾ maxc k − 1, x, b − x. Together with all previous optimizations, we obtain the complexity O (n

2log n) (60–70 points, +15 for the first two subtasks).

In order to go through the remaining subtasks, you need to find the strategy locally and carefully save it in the program code. Let f (k) be equal to the maximum n at which one can guess the number in k queries. Note that the strategy for f (k) allows us to guess the number in k queries and for any smaller n. Then, to solve the problem, it is required to find strategies for f (1), f (2),. . . , f (maxk = 19) and maxn = 30,000.

For a specific k and n, the strategy can be represented as a decision tree of depth k. Such a tree can be obtained by local calculations on your computer, even if the solution does not fit in time in the testing system.

To prevent the tree from getting too large for large k, we note the following:

• tree vertices corresponding to the same search conditions can be saved only once.

• if we need to save several strategies at once, overlapping states between these strategies can also be saved only once.

• let b = 0 in any search state (that is, the smallest prefix and suffix do not intersect). Then we can use the usual bin search, and such a tree branch can obviously not be saved.

Using these optimizations, the jury’s reference solution builds a compressed decision tree with <2000 vertices. The constructed code takes 72 kilobytes, and the construction takes 3 minutes and uses <6 gigabytes of memory.

XXXI All-Russian Olympiad of schoolchildren in computer science, final stage, all tasks

Innopolis, April 11-17, 2019

If some symbols or statements in the above translation are not clear enough, the original text in Russian could help you decipher the meanings.