## Server Bug Fix: Find the number using binary search against one possible lie

We all know this classic problem, “there is some hidden number and you have to interactively guess it.”, which could be solved using binary search when we know that maximum number that we can guess.

But what if the interactor may lie to us in one case? For example, the number is $$3$$ and the maximum number we can guess is $$10$$, but when we ask whether it is greater than $$5$$ and it replies yes and for the rest of queries it answers correctly. A simple binary search would fail.

How to proceed in this case? What is the minimum number of query in the worst case?

A generalization of this class of problems is widely studied. See, e.g., this paper for a survey.

In your particular case, the problem can be easily solved without any asymptotic change in the computational complexity. Run the binary search three times. At least two of the three results must be equal to the hidden number. Return the majority result.

There are other elegant ways to handle up to $$k$$ of lies by only using $$O(log n + k)$$ time (where $$k$$ might be a function of $$n$$).

If normal binary search would take k questions, then you can solve this with 2k+1 questions: Ask each question twice. If you get the same answer, it was the truth. If not, a third question reveals the truth, this happens only once.

I suspect you can do better. If the number is from 1 to 100 and I check the numbers 40 and 60 then knowing that one answer is true will give me some information.

More difficult if he can lie once, but can repeat the lie (so asking the same question ten times reveals nothing). For example if the number is from 1 to 100, and the liar insists that it is equal to 87 and equal to 88 (one is true, one is a repeated lie) you have no chance finding out what the number is.

If it was wrong one time, but right all of the other times after it, then we notice it will return you always the same thing after that wrong decision, and it would be the negation of them. In this example, it would mean that one time it said “above” and was wrong, but all of the other times after it, it would always answer “below”.

Thus, you can just trace this back to the last time it “changed it’s mind” and continue normal binary search

The key to this question is understanding that it is impossible to find out what the unknown number is and that is all. All you can find out is both what the unknown number is and which question was answered with a lie.

To take a concrete example, suppose that the number is between 1 and 10, and suppose that we know that one answer is a lie. We ask six questions. At the end, we will have distinguished between $$10$$ possibilities (which number it is) and also between $$6$$ possibilities (which question was answered falsely). $$10times 6=60$$, and since $$2^6=64$$, six yes-or-no-questions will indeed have identified not only the number but also the false answer.

To be precise, the number $$k$$ of questions needed to distinguish between $$n$$ numbers, where one answer is definitely a lie, satisfies $$kgeqlog_2n+log_2k$$

On the other hand, if there only might be a lying answer, then we have to allow for the “no lie” possibility, and the criterion is $$kgeqlog_2n+log_2(k+1)$$

Can we achieve this bound, or do real algorithms need more? This depends on whether or not you are expected to ask all the questions before you receive any answers. If you aren’t, then life is a little easier, and here is a sketch of how you would proceed in the case of $$n=10$$ and $$k=6$$.

Question A Is the number 1, 2, 3, 4, or 5?
If the answer is Yes, then the possibilities are 1B, 1C, 1D, 1E, 1F, 2B, 2C, 2D, 2E, 2F, 3B, 3C, 3D, 3E, 3F, 4B, 4C, 4D, 4E, 4F, 5B, 5C, 5D, 5E, 5F, 6A, 7A, 8A, 9A and 10A, where the number denotes the value of the unknown number and the letter denotes which answer was a lie. There are 30 possibilities, so we have done a perfect binary chop on the original space of 60.

Supposing that the answer was Yes:

Question B Is the number 1, 2, 3 or 10?
If the answer is Yes, then the possibilities are 1C, 1D, 1E, 1F, 2C, 2D, 2E, 2F, 3C, 3D, 3E, 3F, 4B, 5B, or 10A. There are 15 possibilities, so we have done a perfect binary chop on the previous space of 30.

If the answer to question A had been No, then question B would have been different (for instance, “1, 8, 9 or 10?”).

Supposing that the answer to question B was also Yes.

Question C Is the number 1 or 2?
If the answer is Yes, the possibilities are 1D, 1E, 1F, 2D, 2E, 2F, and 3C. That makes 7 for a Yes or 8 for a No, which is the closest to a bisection that we can get. Question D will therefore have to distinguish between either 7 or 8 possibilities.

The rest of this process is left as an exercise for the reader.

Note that the choice of each question does depend on the answers to the previous questions. If this is not what you want, then the field of error-correcting codes comes into its own, since an error-correcting code effectively transmits a whole batch of answers to a set predetermined questions, and the receiver’s job is to deduce the value on the basis of those answers.

In this answer it’s assumed there’s at most 1 lie in the sense that if you ask the same question twice and get the same answer twice, you know for sure that it’s not a lie.

Using the observation in Nir Shahar’s answer, an algorithm can be constructed that performs at most $$lceillog_2{n}rceil + 2lceilsqrt{lceillog_2nrceil}rceil + 1$$ (or slightly less) comparisons.

The observation is that when a binary search repeatedly made the same decision (say “bigger than” each time) during the last steps up to the current point, the lie can only be right before that, so at the last time it made the other decision (in that case “smaller than”) (or the lie still has to come or the lie was the last decision). More generally while doing a normal binary search, if the lie has come already the lie was the last time the other decision was made than the last one (or the lie was the last decision).

fix a constant $$c = lfloor sqrt{lfloor log_2nrfloor}rfloor$$
and let the search depth at each point be $$d$$.

Just perform the normal binary search until c the same decisions are made in a row (since the last time we made this same check). At this point, check if the decision at the last change was correct (that is, at depth $$d-c$$). If that was not a lie, we know there was no lie up to this point using only 1 extra comparison per $$c$$ steps. If it was a lie, we found the lie while doing less than about $$frac{log_2 n} c + c approx 2 sqrt{log_2 n}$$ extra comparisons.

I think this solution is optimal for 1 potential lie, but I’m not sure. A proof that no better solution exists would require a complex reasoning along the lines of https://cs.stackexchange.com/a/51499/28999

A similar problem, named “Black Hole”, appears as one of the problems of 2019 Russian Olympiad of schoolchildren in computer science.

The problem asks for a program that interacts with a jury program simulating probe sensors and determines the radiation level of each black hole. The sensor mounted on the probe can answer the following queries: determine the x value by the value of x, whether it is true that the radiation level is greater than or equal to x. Unfortunately, due to a software error, the sensor response may not be correct. Fortunately, after the first incorrect answer, the sensor of this probe changes its state and gives only correct answers to all subsequent requests.

The following section is the solution to that problem given in the link provided by user my pronoun is monicareinstate. It is translated from Russian into English by Google.

Note that if we were given the same answer twice for the same request, this answer must be correct. Therefore, for subproblem 1 (n ≤ 1000, q ≤ 30), we can perform a regular binary search, repeating each query three times and believing that the answer is repeated twice. For subproblem 2 (n ⩽ 1000, q ⩽ 21), we note that the query should be repeated the third time only if the first two answers were different, and after that the answers to all the queries will be sure to be correct. Thus, the number of requests will be 3⌈log2 n⌉ and 2⌈log2 n⌉ + 1, respectively.

In all other subtasks, it is required to meet the minimum number of requests, sufficient for any strategy for responding to requests for a given n. The first few subtasks (n ⩽ 12 or n ⩽ 25) can be completed by enumerating possible strategies. As possible optimizations, one can set the enumeration state with a multiset of all received answers, and also use the fact that the number of allowed queries is small (no more than 9 for n ⩽ 25).

To obtain a polynomial-time solution, we note the following. Let us be told as answers that the answer belongs to prefixes of length p1 ⩽ p2 ⩽. . . and suffixes of length s1 ⩽ s2 ⩽. . .. Then the answer about the prefix of length p2 could not be wrong, since then the answer about the prefix of length p1 would also be incorrect; similarly, the answer about the suffix of length s2 is also exactly correct. Therefore, the search state can be uniquely defined by the numbers p1, p2, s1, s2. We will calculate the values ​​ans p1, p2, s1, s2 – the required number of queries to guess the number in this state. The number x must belong to the union of the ranges [n − s1 + 1, p2] ∪ [n − s2 + 1, p1]; if the length of this union is 1, then the value is 0. Otherwise, for an arbitrary query? x we go into one of two states whose parameters are easily calculated (we denote these states by L (x) and R (x)); the optimal query x must minimize max (ansL (x), ansR (x). To calculate ans … we use dynamic programming. In this solution, we have O (n4) states, in each of which O (n) transitions are possible, therefore the total difficulty is O (n5). Such a solution gains 30–35 points (in addition to 15 points for subtasks 1 and 2).

Consider several ways to optimize this solution:

• Let the prefix of length p1 and the suffix of length s1 do not intersect. This means that by this moment one of the answers was definitely incorrect, and in the remaining range of possible values you can use the usual binary search. We turn to more convenient notation for the case when this is not true: let b be the length of the intersection [1, p1] ∩ [n – s1 + 1, n], a = p1 – b, c = s1 – b.

• Note that the values ​​of ans p1, p2, s1, s2 and ans p1 + d, p2 + d, s1 − d, s2 − d coincide, and the strategies for these states differ by shifting all requests to d. This allows you to set the state by numbers
p2 – p1, p1 + s1, p2 + s1, and the complexity of the solution with this optimization is O (n4) (35–40 points).

• Note that in any state, the optimal number of requests after the response <to the request? x does not decrease with increasing x; likewise, the number of requests after an answer> = x does not increase. This means that the optimum max (ansL (x), ansR (x)) can be searched by binary search on x, which reduces complexity to O (n3log n) (along with the previous optimization
40–48 points).

• It is easy to show that, for example, with increasing c, the position of the optimal response does not decrease; this allows us to search for the optimal transition amortized over O (1). Together with previous optimizations, we get O (n3) complexity (55-60 points).

• Note that the response value is O (log n). We swap the value of the answer and one of the DP parameters: let maxc k, a, b be equal to the maximum value of c at which in state (a, b, c) it is possible to guess the number for k queries, or −∞ if this cannot be done either which c. Then the following transitions are possible:

• if maxc k − 1, a, b ̸ = −∞, then maxc k, a, b ⩾ maxc k − 1, a, b + 2k − 1 – b. Indeed, in the state (a, b, c + 2k − 1 – b), we make a query leading to the states (a, b, c) and (b, 0, 2k − 1 – b), in the second of them the usual bin search .
• if a ⩾ 2k − 1 – b, then, by a similar argument, maxck, a, b ⩾ maxc k − 1, a− (2k − 1 − b), b.
• let us make a query that breaks the middle part of length b into parts of length x and b – x, then the transitions are made to the states (a, x, b − x) and (x, b − x, c). Then if satisfied maxc k − 1, a, x ⩾ b – x, then maxc k, a, b ⩾ maxc k − 1, x, b − x. Together with all previous optimizations, we obtain the complexity O (n
2log n) (60–70 points, +15 for the first two subtasks).

In order to go through the remaining subtasks, you need to find the strategy locally and carefully save it in the program code. Let f (k) be equal to the maximum n at which one can guess the number in k queries. Note that the strategy for f (k) allows us to guess the number in k queries and for any smaller n. Then, to solve the problem, it is required to find strategies for f (1), f (2),. . . , f (maxk = 19) and maxn = 30,000.

For a specific k and n, the strategy can be represented as a decision tree of depth k. Such a tree can be obtained by local calculations on your computer, even if the solution does not fit in time in the testing system.

To prevent the tree from getting too large for large k, we note the following:

• tree vertices corresponding to the same search conditions can be saved only once.
• if we need to save several strategies at once, overlapping states between these strategies can also be saved only once.
• let b = 0 in any search state (that is, the smallest prefix and suffix do not intersect). Then we can use the usual bin search, and such a tree branch can obviously not be saved.

Using these optimizations, the jury’s reference solution builds a compressed decision tree with <2000 vertices. The constructed code takes 72 kilobytes, and the construction takes 3 minutes and uses <6 gigabytes of memory.

XXXI All-Russian Olympiad of schoolchildren in computer science, final stage, all tasks
Innopolis, April 11-17, 2019

If some symbols or statements in the above translation are not clear enough, the original text in Russian could help you decipher the meanings.

## Math Genius: MDP tabular setting.

I would be very curious to know if in the tabular MDP setting we have:
$$E_{tau_1,cdots, tau_{N} sim P^{pi}_{mu}} (1_{A}) = E_{s_1,cdots,s_{N times H} sim d^{pi}_{mu}} E_{a_1 sim pi(.|s_1) cdots a_{N times H} sim pi(.|s_{N times H})}(1_A)$$

With the usual notations: $$tau$$ are paths of length $$H$$, $$pi$$ is a policy, $$mu$$ is the initial distribution, $$d^{pi}_{mu}$$ is the state probability of appearance (ie $$P(s)$$).
$$A$$ is an event involving all the data points. (like for example observing $$(s,a)$$ $$k$$ times). It seems intuitively true, but not sure how to show it.

Thank you!!

## Code Bug Fix: Algorithms : Pizza Cutting

I’m trying to solve https://leetcode.com/problems/number-of-ways-of-cutting-a-pizza/

Given a rectangular pizza represented as a rows x cols matrix
containing the following characters: ‘A’ (an apple) and ‘.’ (empty
cell) and given the integer k. You have to cut the pizza into k pieces
using k-1 cuts.

For each cut you choose the direction: vertical or horizontal, then
you choose a cut position at the cell boundary and cut the pizza into
two pieces. If you cut the pizza vertically, give the left part of the
pizza to a person. If you cut the pizza horizontally, give the upper
part of the pizza to a person. Give the last piece of pizza to the
last person.

Return the number of ways of cutting the pizza such that each piece
contains at least one apple. Since the answer can be a huge number,
return this modulo 10^9 + 7.

The pizza consists of at most 50 rows and columns, and k <= 10.

For now, I have a very basic, brute force solution, which I plan to optimize later on.

Here’s my code :

``````class Solution {
long combos;
public int ways(String[] pizza, int k) {
Set<String> remaining = new HashSet<String>();
this.combos = 0;
for(int i = 0; i < pizza.length; i++) {
for(int j = 0; j < pizza[i].length(); j++) {
char ch = pizza[i].charAt(j);
if(ch == 'A') remaining.add(i + "," + j);
}
}

getCombos(remaining, k - 1);
return (int)(combos % (1000000007));
}

public void getCombos(Set<String> remaining, int k) {
if(remaining.isEmpty()) return;
if(k == 0) {
combos++;
return;
}

Set<Integer> seenCol = new HashSet<Integer>();
Set<Integer> seenRow = new HashSet<Integer>();
for(String apple : remaining) {
String[] posStr = apple.split(",");

int i = Integer.parseInt(posStr);
int j = Integer.parseInt(posStr);

// cut below this
Set<String> below = getBelow(i, j, remaining);

// cut right
Set<String> right = getRight(i, j, remaining);
}
}

public Set<String> getBelow(int i, int j, Set<String> remaining) {
Set<String> result = new HashSet<String>();
for(String apple : remaining) {
String[] coords = apple.split(",");
}
return result;
}

public Set<String> getRight(int i, int j, Set<String> remaining) {
Set<String> result = new HashSet<String>();
for(String apple : remaining) {
String[] coords = apple.split(",");
}
return result;
}
}
``````

This doesn’t pass the following test case : [“.A..A”,”A.A..”,”A.AA.”,”AAAA.”,”A.AA.”]
5

The expected result is 153, however, my code returns 141.

I can’t figure out why and would appreciate any help.

Thank you!

You are only cutting adjacent to an apple, but cutting between apples can also be valid.

For example, consider that bottom row “A.AA.”, so you cut after columns 1,3 and 4 (although 4 turns out to be invalid). However, you are missing that you can also cut after column 2 !

## Code Bug Fix: Factorial of a number using user defined function giving garbage value in c

I was trying to construct a program to calculate factorial using user defined function but it is giving a garbage value this program was an assignment for c-cat preparation

``````#include<stdio.h>
#include<conio.h>
#include<stdlib.h>
int accept_num();             // Function call
int fact();                   // Function call
void display_num();           // Function call
int main()
{
int num;

num = fact();        //calling factorial

display_num(num);    //calling printf

}
//take input
int accept_num()
{
int n;
printf("Enter number: ");
scanf("%d",&n);
return n;
}
//calculate factorial
int fact()
{
int i,num1;
unsigned long long factr=1;

num1 = accept_num();

for(i=1; i<=num1; i++)
{
factr *= i;
}
return factr;
}

void display_num(int num2)
{
}
``````

Mismatching types often leads to garbage value. After calculating factorial, you stored it to `int` type and in `display_num()` function you are printing `unsigned long long int` which has more number of bits reserved than `int`. Also if you use `int i` it will result into false value of factorial after loop iterates after limit of `int`.

Below is the refined code, replace every `int` with `unsigned long long int`:

``````#include<stdio.h>
#include<stdlib.h>
unsigned long long int accept_num();             // Function call
unsigned long long int fact();                   // Function call
void display_num();           // Function call
int main()
{
//changed the datatype of num
unsigned long long int num;

num = fact();        //calling factorial

display_num(num);    //calling printf

}
//take input
unsigned long long int accept_num()
{
//changed datatype of n
unsigned long long int n;
printf("Enter number: ");
scanf("%llu",&n);
return n;
}
//calculate factorial
unsigned long long int fact()
{
unsigned long long int i;

//changed datatype of num1
unsigned long long int num1;
unsigned long long int factr=1;

num1 = accept_num();

for(i=1; i<=num1; i++)
{
factr *= i;
}
return factr;
}

//changed datatype of argument num2
void display_num(unsigned long long int num2)
{
}
``````

Also, there are more redundant variables which can be reduced. Refer to this code:

``````#include<stdio.h>
long int multiplyNumbers(int n);

int main() {
int inputNumber;
printf("Enter a positive integer: ");
scanf("%d",&inputNumber);
printf("Factorial of %d = %ld", inputNumber, multiplyNumbers(inputNumber));
return 0;
}

//calculate factorial using recursion
long int multiplyNumbers(int n) {
if (n>=1)
return n*multiplyNumbers(n-1);
else
return 1;
}
``````

Tagged : / / /

## Code Bug Fix: 0-1 Knapsack for large weights

I am trying to solve the 0-1 Knapsack for the large weights problem. Constraints for the problem are :

1. *All values in input are integers.
2. \$\$1 leq N leq 100\$\$
3. \$\$1 leq W leq 10^9\$\$
4. \$\$1 leq w_i leq W\$\$
5. \$\$1 leq v_i leq 10^3*\$\$

Below is the code which I have written with comments.

``````#include <vector>
#include <algorithm>
#include <climits>
using namespace std;

const int V = 1e5;
const int N = 100;

int main() {
int n, w, inp1, inp2, max_val = -1;
vector<pair<int, int>> arr;
cin >> n >> w; // n - no of items, w - total weight that the knapsack can hold
for(int i = 0; i < n; i++){
cin >> inp1 >> inp2; // ith input : w_i v_i
arr.push_back({inp1, inp2});
max_val = max(max_val, inp2); // storing maximum value of an item from the inputs
}
unsigned long long int dp[N + 1][V + 1]; // dp[i][j] = min weight that can be obtained from items 0..(i - 1) having total value atmost j
for(int i = 0; i <= n; i++){
for(int j = 0; j <= (max_val * n); j++){
if(i == 0 || j == 0)
dp[i][j] = ULLONG_MAX;
else if(j >= arr[i].second)
dp[i][j] = min(dp[i - 1][j], dp[i - 1][j - arr[i - 1].second] + arr[i - 1].first);
else
dp[i][j] = dp[i - 1][j];
}
}
for(int i = 0; i <= n; i++){
for(int j = 0; j <= (max_val * n); j++)
cout << dp[i][j] << " ";
cout << endl;
}
int i;
for(i = (max_val * n); i >= 1; i--){
if(dp[n][i] != ULLONG_MAX && dp[n][i] <= w)
break;
}
cout << i;
return 0;
}
``````

But I am not getting correct answer for all inputs. For example, for input

``````3 8
3 30
4 50
5 60
``````

Tagged : / /

## Server Bug Fix: Why is update rule of the value function different in policy evaluation and policy iteration?

In the textbook “Reinforcement Learning: An Introduction”, by Richard Sutton and Andrew Barto, the pseudo code for Policy Evaluation is given as follows: The update equation for $$V(s)$$ comes from the Bellman equation for $$v_{pi}(s)$$ which is mentioned below (the update equation) for your convenience:
$$v_{k+1}(s) = sum_{a} pi(a|s)sum_{s’,r}p(s’,r|s,a)[r+gamma v_{k}(s’)]$$

Now, in Policy Iteration, the Policy Evaluation comes in stage 2, as mentioned in the following pseudo code: Here, in the policy Evaluation stage, $$V(s)$$ is updated using a different equation:
begin{align} v_{k+1}(s) = sum_{s’,r}p(s’,r|s,pi (s))[r + gamma v_{k}(s)] end{align} where $$a = pi(s)$$ is used.

Yes, the two update equations are equivalent. As an aside, technically the equation you give is not the Bellman equation, but the update step re-written as an equation – in the Bellman equation instead of $$v_{k+1}(s)$$ or $$v_{k}(s)$$ (showing iterations of approximate value functions), you would have $$v_{pi}(s)$$ (representing the true value of a state under policy $$pi$$).

The difference between the equations is that

• In the first case of Policy Evaluation, in order to be general, a stochastic policy $$pi(a|s): mathcal{S} times mathcal{A} rightarrow mathbb{R} = Pr{A_t = a|S_t =s}$$ is used. That means to get the expected value, you must sum over all possible actions $$a$$ and weight them by the policy function output.

• In the case of Policy Iteration, a deterministic policy $$pi(s): mathcal{S} rightarrow mathcal{A}$$ is used. For that, you don’t need to know all possible values of $$a$$ for probabilities, but use the output of the policy function directly as the action that is taken by the agent. That action therefore has a probability of $$1$$ of being chosen by the policy in the given state.

The equation used in Policy Iteration is simplified for a deterministic policy. If you want you could represent the policy using $$pi(a|s)$$ and use the same equation as for Policy Evaluation. If you do that, you would also need to alter the Policy Improvement policy update step to something like:

$$a_{max} leftarrow text{argmax}_asum_{r,s’}p(r,s’|s,a)[r + gamma V(s’)]$$

$$text{ for each } a in mathcal{A(s)}$$:

$$qquad pi(a|s) leftarrow 1 text{ if } a = a_{max}, 0 text{ otherwise }$$

Doing this will result in exactly the same value function and policy as before. The only reason to do this would be to show the equivalence between the two sets of update equations when dealing with a deterministic policy.

## Math Genius: How can i find the solution of this NP-hard optimization problem?

begin{align} & min {{sumlimits_{i=1}^{M}{{{a}_{i}}}}_{{}}} \ & s.t{{.}::::_{{}}}{{_{{}}}_{{}}}sumlimits_{i=1}^{M}{{{a}_{i}}{{a}_{i+k}}}>0:: : :{{,}_{{}}}{{_{{}}}_{{}}}forall k=0,1,ldots ,M-1 \ & :::::::::::{{a}_{i}}in left{ 0,1 right}::::::::::{{,}_{{}}}i=1,ldots ,M. \ end{align}
You can linearize the quadratic constraint by introducing a binary variable $$b_{i,k}$$ to represent the product $$a_i a_{i+k}$$. The new constraints are:
begin{align} sum_i b_{i,k} &ge 1 &&text{for all k}\ b_{i,k} &le a_i &&text{for all i and k}\ b_{i,k} &le a_{i+k} &&text{for all i and k} end{align}