## Math Genius: Constrained maximin problem

Let

i) $$mu = [mu_1,mu_2,mu_3]inmathbb{R}^3$$, such that $$mu_2 > mu_1$$, $$mu_2 >mu_3$$ fixed,

ii) $$lambda = [lambda_1,lambda_2,lambda_3] in mathbb{R}^3$$ such that $$lambda_1 geq lambda_2 geq lambda_3$$,

iii) $$alphainSigma_3$$, where $$Sigma_3$$ is the simplex of dimension $$3$$, i.e. $$Sigma_3 = {alpha : alpha_1+alpha_2+alpha_3 = 1}$$.

I am stuck in solving the following maximin problem:

$$sup_{alpha} inf_{lambda} f(alpha,lambda) = sup_{alpha} inf_{lambda}sum_{i =1}^{3}alpha_i(mu_i -lambda_i)^2.$$

What I have done: I framed the problem in the constrained optimisation setting and I solved the first part (inf)

$$inf_lambda f(alpha,lambda) \s.t. lambda_2 -lambda_1 leq 0 \ ;;;;lambda_3 -lambda_2 leq 0$$

formed the Lagrangian $$mathcal{L}(alpha,lambda,rho) = f(alpha,lambda) – rho_1( lambda_2 -lambda_1)-rho_2( lambda_3 -lambda_2)$$, and gone through the solution of the problem by imposing KKT conditions. I get two different solutions that depends on the values of alpha:

$$begin{cases}lambda_1 = lambda_2 = lambda_3 = bar{mu} = sum_{i} alpha_imu_i text{ if }mu_1 geq bar{mu} text{ and } mu_3 leq bar{mu} \ lambda_1 = lambda_2 = bar{mu}_{12} = frac{1}{alpha_1 + alpha_2} alpha_1mu_1 + alpha_2mu_2, lambda_3 = mu_3 text{ if } mu_2 geq bar{mu}_{12} text{ and } mu_1 leq bar{mu}_{12} end{cases}$$

When I pass to the second outer optimisation problem (sup) I follow the same approach and treat the two cases independently propagating the KKT constraints in the constraints of the sup problem and adding the equality constraint for the probability simplex on the alphas. However, I cannot find any feasible solution from this problem.

Since I am new to optimisation, and essentially self-taught, I was wondering if I am following the right approach to solve such maximin problem or if there is some easier way to proceed. Any help from more knowledgeable people will be of great help to me since I have been stuck on this problem for a while now.

Thanks to anyone who will try to help in advance.

## Math Genius: Convexity analysis

I know that summation of two convex functions is also convex but I would like to know that does the same holds true for non-convex functions also such that the summation of two non-convex functions also non-convex?

No. Sum of two non-convex functions can be a convex function. Consider the following:
$$f(x)=x^4-2x^3 , g(x)=2x^3$$
We see that $$h(x) = f(x)+g(x)=x^4$$ is a convex function.

## Math Genius: Practical Applications of the Fréchet Derivative

I understand the definition of the Fréchet derivative. However, outside of functions on $$mathbb R^n$$, I’ve never encountered an application where it was particularly useful.

Can anyone share examples of (or references to) uses of the Fréchet derivative, especially ones where other methods would not have worked?

Optimisation problems are preferred, but nice applications elsewhere would also be appreciated. As suggested above, I am looking for examples where the domain is not $$mathbb R^n$$.

Some Background

For example, on the odd occasion where I’ve had to find the optimizer of some functional defined on a Banach space, I always try to calculate the Fréchet derivative to see if it would help. I usually end up with a condition on variations that I don’t really know how to make use of, and end up resorting to other methods to solve my problem.

It doesn’t help that the references I turn to almost exclusively only have examples of the use of the Fréchet derivative on finite-dimensional spaces. If not, then they were actually optimising over a space of appropriately smooth functions, so Euler-Lagrange would have done the trick.

I suppose this is just a lack of education on my part, but I do think it’s noteworthy that the Wikipedia article I link to above only features examples in the form of calculating the derivative of specific functions. It says very little about why I might want to calculate it, aside from passing references in the introduction, or, how, once I’ve calculated it, I might be able to make use of it.

The books Introduction to Mechanics and Symmetry and Manifolds, Tensor Analysis and Applications discuss some applications to physics of the Fréchet derivative outsise $$mathbb{R}^{n}$$. The latter uses Fréchet derivatives in the context of Classical Field theory and variational problems. The former uses some of these concepts in classical mechanics.

## Math Genius: How to solve KKT with constraint that matrix must have all nonnegative elements?

I’m trying to solve an optimization problem where I want to minimize something
$$(s-Pi vec{1})^{T}K(s-Pi vec{1}) + (t-Pi^{T} vec{1})^{T}K(t-Pi^{T} vec{1})$$ over the constraint that $$Pi$$ is a square matrix with all nonnegative entries summing up to $$1$$. I know how to work with the latter constraint ( let $$vec{1}^{T}Pi vec{1}=1$$ ), but how do I work with the former?

$$(s-Pi vec{1})^{T}K(s-Pi vec{1}) + (t-Pi^{T} vec{1})^{T}K(t-Pi^{T} vec{1}) + lambda(1-vec{1}^{T}Pi vec{1}) + textrm{something for nonnegative entries}.$$

My question is “what would that something for nonnegative entries be?”

Tagged : /

## Math Genius: Pouring water from bottles

There are three buckets of size $$x_1, x_2$$, and $$x_3$$ liters (positive, but not necessarily integers), and some bottles, possibly of different sizes, containing a total of $$x_1+x_2+x_3$$ liters of water. We want to pour water from the bottles into the buckets. A bottle is split if it is poured into more than one bucket. Is it true that for any $$x_1,x_2,x_3$$, there are bottle sizes such that we need to split two bottles?

By pouring water into the first bucket until it is full, then second bucket, and third, we will never need to split more than two bottles.

For the case $$(x_1,x_2,x_3)=(1,3,6)$$, @WhatsUp’s example with three bottles of capacity $$10/3$$ shows that we do need to split two bottles (that is, splitting one bottle does not suffice).

This proof uses the same basic idea of having $$N$$ bottles with $$frac1N$$ of the volume as @mathworker21’s proof and merely provides a more elementary proof of the existence of suitable fractional parts that doesn’t require ergodic theory or linear independence over $$mathbb Q$$.

Without loss of generality, rescale the bucket sizes so that $$x_1+x_2+x_3=1$$. Consider $$N$$ bottles with volume $$frac1N$$ each. The fractional parts of $$Nx_i$$ can add up to $$0$$, $$1$$ or $$2$$. If we can choose $$N$$ such that they add up to $$2$$, it will be necessary to split $$2$$ bottles (since one bottle can only fill fractional parts that add up to $$1$$). They add up to $$2$$ if and only if two of them add up to more than $$1$$. So we want to show that e.g. $$({Nx_1},{Nx_2})$$ lies in the upper right half of $$[0,1]^2$$ for some $$N$$. It would be slightly surprising if it required advanced concepts to show that we can hit an entire half of the square.

If all $$x_i$$ are rational with common denominator $$d$$, choose $$N=d-1$$. Since $${dx_i}=0$$, we have $${Nx_i}={dx_i-x_i}={-x_i}=1-x_i$$, and thus $$sum_i{Nx_i}=3-sum_ix_i=2$$.

Else, at least two of the $$x_i$$ must be irrational; assume without loss of generality that $$x_1$$ and $$x_2$$ are and that $$x_1lt x_2$$. If $$x_2ltfrac12$$, some multiple $$kx_2$$ lies in $$left[frac12,1right]$$, and either $$sum_i{k x_i}=2$$ and we are done, or $$sum_i{k x_i}=1$$ and we can replace the $$x_i$$ by $${k x_i}$$; so we can assume $$x_2gtfrac12$$.

Since $$x_1+x_2lt1$$, by Dirichlet’s approximation theorem (which can be proved by an elementary application of the pigeonhole principle), there is $$Minmathbb N$$ such that $${M x_1}gt x_1+x_2$$ and thus $${(M-1)x_1}gt x_2$$. At least one of $${Mx_2}$$ and $${(M-1)x_2}$$ is at least $$1-x_2$$. (This is where $$x_2gtfrac12$$ is needed.) Thus, for at least one of $$N=M$$ and $$N=M-1$$ we have $${Nx_1}+{Nx_2}gt x_2+1-x_2=1$$.

Yes. Take $$N$$ as in Lemma $$1$$ with $$alpha = frac{x_1}{x_1+x_2+x_3}$$ and $$beta = frac{x_2}{x_1+x_2+x_3}$$. Let $$b_1,dots,b_N = frac{x_1+x_2+x_3}{N}$$. Suppose we could only split one bottle, giving $$delta_1,delta_2$$ of it to the buckets with $$x_1,x_2$$ liters (initially), respectively and possibly some to bucket $$x_3$$. Then there are $$m_1,m_2 in mathbb{N}$$ with $$m_1frac{x_1+x_2+x_3}{N} = x_1-delta_1$$ and $$m_2frac{x_1+x_2+x_3}{N} = x_2-delta_2$$. Then $$frac{Nx_1}{x_1+x_2+x_3} = m_1+frac{Ndelta_1}{x_1+x_2+x_3}$$ and $$frac{Nx_2}{x_1+x_2+x_3} = m_2+frac{Ndelta_2}{x_1+x_2+x_3}$$. Since $$delta_1,delta_2 < frac{x_1+x_2+x_3}{N}$$ (if one of them were equal to $$frac{x_1+x_2+x_3}{N}$$, then one of $${frac{Nx_1}{x_1+x_2+x_3}},{frac{Nx_2}{x_1+x_2+x_3}}$$ would be $$0$$, contradicting that their sum is more than $$1$$), we see $$frac{Ndelta_1}{x_1+x_2+x_3}+frac{Ndelta_2}{x_1+x_2+x_3} = {frac{Nx_1}{x_1+x_2+x_3}}+{frac{Nx_2}{x_1+x_2+x_3}} > 1$$, meaning $$delta_1+delta_2 > frac{x_1+x_2+x_3}{N}$$, a contradiction to having split a bottle properly.

.

Lemma 1: Given any $$alpha,beta > 0$$ with $$alpha+beta < 1$$, there is some $$N in mathbb{N}$$ with $${alpha N}+{beta N} > 1$$.

Proof: If $$alpha,beta$$ are $$mathbb{Q}$$-linearly independent, then $${({alpha N},{beta N}) : N ge 1}$$ is dense in $$mathbb{T}^2$$, so clearly a desired $$N$$ exists. Otherwise, $$beta = frac{c}{d}alpha+frac{p}{q}$$ for some $$c,d,p,q in mathbb{Z}^{ge 0}$$. Then $${alpha N}+{beta N} = {alpha N}+{alphafrac{c}{d} N+frac{Np}{q}}$$. First suppose $$alpha$$ is irrational. Then since $${{alpha N’dq} : N’ ge 1}$$ is dense in $$mathbb{T}$$, we get a desired $$N$$ by taking $$N = N’dq$$ with $${alpha N’dq} > 1-frac{1}{c^2d^2}$$, since then $$alpha cqN’ = frac{(k+1)c}{d}-frac{c}{d}epsilon$$ for some $$k in mathbb{Z}$$ and $$0 < epsilon < frac{1}{c^2d^2}$$, meaning $${alpha cqN’}$$ is either at least $$1-frac{c}{d}epsilon$$ or at least $$frac{1}{d}-frac{c}{d}epsilon$$, both large enough. Now suppose $$alpha = frac{m}{n}$$ is rational, with $$gcd(m,n) = 1$$. Then write $$frac{m}{n}frac{c}{d}+frac{p}{q} = frac{m’}{n’}$$ with $$gcd(m’,n’) = 1$$. We wish to show $${frac{m}{n}N}+{frac{m’}{n’}N} > 1$$ for some $$N in mathbb{N}$$. WLOG suppose $$n ge n’$$. We can take $$N$$ so that $$Nm equiv -1 pmod{n}$$ and $$n’ nmid N$$; indeed, if $$n’ mid n$$, then clearly $$Nm equiv -1 pmod{n’}$$ as well, and otherwise, $$N = kn+m^*$$ for an appropriate $$k$$ works, where $$m^*m equiv -1 pmod{n}$$. For this $$N$$, we have $${frac{m}{n}N}+{frac{m’}{n’}N} ge frac{n-1}{n}+frac{1}{n’} ge 1$$, with equality only if $$n’ = n$$ and $$mN equiv -1 pmod{n}$$ and $$m’N equiv 1 pmod{n}$$. But if we had equality, then $$(m+m’)N equiv 0 pmod{n}$$, meaning $$alpha+beta = frac{m}{n}+frac{m’}{n} = 1$$, which is false.

.

Fact: If $$alpha,beta$$ are $$mathbb{Q}$$-linearly independent, then $${({alpha N},{beta N}) : N ge 1}$$ is dense in $$mathbb{T}^2$$.

Proof: Define $$T: mathbb{T}^2 to mathbb{T}^2$$ by $$T(x,y) = (x+alpha,y+beta)$$. It suffices to show that $$T$$ is ergodic w.r.t. the Lebesgue measure. Suppose $$f in L^2(mathbb{T}^2)$$ is $$T$$-invariant. By basic fourier analysis, $$f(x_1,x_2) = sum_{k_1,k_2 in mathbb{Z}} c_{k_1,k_2}e^{2pi i (k_1x_1+k_2x_2)}$$. Then $$sum_{k_1,k_2} c_{k_1,k_2} e^{2pi i (k_1x_1+k_2x_2)} = f(x_1,x_2) = fcirc T(x_1,x_2)= sum_{k_1,k_2} c_{k_1,k_2} e^{2pi i (k_1(x_1+alpha)+k_2(x_2+beta))} = sum_{k_1,k_2} e^{2pi i(k_1alpha+k_2beta)}c_{k_1,k_2}e^{2pi i (k_1x_1+k_2x_2)}$$ Therefore, $$c_{k_1,k_2} = c_{k_1,k_2}e^{2pi i (k_1alpha+k_2beta)}$$ for each $$k_1,k_2 in mathbb{Z}$$. Since $$alpha,beta$$ are $$mathbb{Q}$$-linearly independent, we have $$c_{k_1,k_2} = 0$$ for all $$(k_1,k_2) not = (0,0)$$. It follows that $$f$$ is a.e. constant, as desired.

Tagged : / /

## Server Bug Fix: Can Operations Research applications scale?

What do I mean by scaling?

Let’s say you developed a software for an operations research application for an industry client. Now you want to make a product out of this and sell it for other clients.

They all may share the same basic problem. However in my experience each new client has probably new additional constraints/different data landscapes, which require much overhead for each new customer.

On the contrast for example selling a solver (like Gurobi,
Cplex, …) can scale pretty good, because new customers do not necessarily create large overhead.

So the question is can we build software for Operations Research applications that scale and if yes how and are there examples that achieved this?

Yes we can, depending on the context and objective, and several companies are successful at doing it with hundreds of customers across the globe. An important disclaimer here is how specific problems you want your solution be able to solve. There’s a tradeoff between developing general OR software vs. doing consulting or creating custom-made models that might only apply for a specific context1 and most likely need work adapting for another case.

Some examples of general OR software with some problem-specific capabilities (for common constraints and models needed by organizations, like scheduling or routing) are:

1. Google OR-Tools has several specific modelling capabilities implemented, such as routing, scheduling, network flows and bin packing.
2. IBM CPLEX Optimization Studio provides the ability to model specific constraints useful for applications, such as precedence constraints or sequence constraints.

While there are lots of examples for problem-specific, software, especially vehicle routing:

1. Llamasoft sells a couple of solutions for the supply chain (simulation, capacity planning, demand modeling among others).
2. Optibus provides planning and scheduling capabilities for mass transit.

There will always be a sort of 80-20 situation here, where if you want to model all the context of a real-world process, you might need to extend these APIs – which may be impossible for proprietary software – or create your code from scratch (instead of using a prepackaged software). For example, you might want to route in a periodic sense, or coupling several days instead of daily, or according to additional rules, and that’s a feature not all commercial routing software will have. There’s certainly relevance on how much the different companies invest in R&D teams, customer support and implementation consulting. Or if you develop a staff scheduling solution for a specific industry, it’s almost certain that you’ll need to adapt, or activate/deactivate a set of constraints or parameters, just because of legislation or contract dynamics. That’s certainly an interesting conversation about software design and modularity to keep in mind.

1 Like many case studies on Operations Research practice published in INFORMS Journal on Applied Analytics, formerly known as Interfaces.

Tagged :

## Math Genius: Linear Quadratic optimal control in feedback form

Consider the LTI system
$$begin{equation}label{e1} dot{mathbf{x}}(t) =A mathbf{x}(t)+B mathbf{u}(t) end{equation}$$
Assume that the system is controllable. it is well known that, if we want to steer the system from $$mathbf{x}(0)=0$$ to a certain target state $$mathbf{x}(t_f)=mathbf{x}_f$$, the control $$mathbf{u}(t)$$ that does that and minimizes the energy functional
$$E=int_{0}^{t_{f}} |mathbf{u}(t) |_2^2 : mathrm{d} t$$
is given by $$begin{equation}mathbf{u}^{*}(t)=B^{T} mathrm{e}^{A^{T}left(t_{f}-tright)} W ^{-1}mathbf{x}_{mathrm{f}}end{equation}$$
where $$W=displaystyleint_{0}^{t_{mathrm{f}}} mathrm{e}^{Atau} B B^{T} mathrm{e}^{A^{T}tau} : mathrm{d} tau$$ is the controllability Gramian matrix.

Now, I would like to write the optimal control $$mathbf{u}^{*}(t)$$ in a feedback form, i.e. something like:
$$mathbf{u}^{*}(t) = -K(t)mathbf{x}(t)$$

Can anyone show me how to do it? What is the gain $$K(t)$$ in this case? Can I write it in terms of the Gramian matrix?

In order to do this you can use a slightly more general expression. Namely, the expression for the input that drives the state from $$x(t_i) = x_i$$ to $$x(t_f) = x_f$$, which is given by

begin{align} W(t) &= int_0^t e^{A,tau} B,B^top e^{A^top tau} dtau, tag{1} \ u(t) &= B^top e^{A^top (t_f – t)},W(t_f-t_i)^{-1}left(x_f – e^{A,(t_f-t_i)} x_iright). tag{2} end{align}

Note that one gets your equation when one uses $$t_i = 0$$ and $$x_i = 0$$ in $$(2)$$.

In order to get a feedback policy one can replace $$t_i$$ and $$x_i$$ in $$(2)$$ with $$t$$ and $$x(t)$$ respectively, yielding

$$u(t) = B^top e^{A^top (t_f – t)},W(t_f-t)^{-1}left(x_f – e^{A,(t_f-t)} x(t)right). tag{3}$$

The expression from $$(3)$$ can be seen as evaluating $$(2)$$ using $$t_i$$ equal to the current value of $$t$$ and only evaluating $$u(t)$$ at this current time.

It can be noted that $$W(0) = 0$$, thus in the limit of $$t$$ to $$t_f$$ the expression for $$W(t_f-t)^{-1}$$ in $$(3)$$ blows up.

## Math Genius: Graphing a function and finidng global minimum

Consider the following function. Here y is the dependent variable, x is the independent variable and e is the error term.

$$y − xβ_1 = e$$,

Is there a way to graph this for the case when $$x = 1$$ and find the global minimum for $$β_1$$?

## Math Genius: Why is the Farkas Lemma so popular?

When studying optimization, the Farkas Lemma is a very common occurrence. It is used in order to obtain a solvability criterion for linear programs, since it states that exactly one of two problems is solvable. Currently I am doing some research on linear programs and I am a little bit confused on why it is that popular.

Why can I not always project out all variables with Fourier-Motzkin elimination and verify if the inequality of the form $$bgeq 0$$ holds, with the known vector $$b$$? This seems to be simpler for me.

Farkas’ Lemma comes in different variants. I prefer the following variant.

The system (1) $$Amathbf x leq mathbf b$$ has no solution if and only if (2) there exists some $$mathbf u geq mathbf 0$$ with $$mathbf u^TA = mathbf 0^T$$ and $$mathbf b^Tmathbf u < 0$$.

But what does this lemma say? It says that (1) has no solution if and only if it implies the inequality $$0 < 0$$.

Explanation: The vector $$mathbf u$$ consists of non-negative coefficients and (3) $$mathbf u^TAmathbf x leq mathbf u^Tmathbf b$$ is a non-negative combination of the inequalities of (1). Moreover (3) is implied by (1) as any solution of (1) is also a solution of (3). In (2) we additionally stipulate that $$mathbf u^TA = mathbf 0$$ and $$mathbf u^Tmathbf b < 0$$. Hence, we have $$mathbf 0^Tmathbf x < 0$$. Moreover, since $$mathbf 0^Tmathbf x = 0$$, we have shown that (1) implies the inequality $$0 < 0$$.

In practice, one uses the following linear program to check for the solvability of $$A mathbf x leq mathbf b$$:
$$text{maximize } mathbf 0^Tmathbf x text{ s.t. } Amathbf x leq mathbf b.$$
By duality this linear program is infeasible if and only if the dual program $$text{minimize } mathbf b^Tmathbf u text{ s.t. } A^Tmathbf u = mathbf 0 text{ and } mathbf u geq mathbf 0$$
is unbounded.
But the later is equivalent to Farkas’ Lemma as a simple exercise shows.

I also found the following answers:
Linear prorams can be solved with cutting plane algorithms. Many of those are based on the Farkas Lemma. In the integer vase the Farkas Lemma has to be modifies. But even this modiefied version can be used for the prove of the algorithms terminating in finite time. Furthermore, is the Farkas Lemma important for the theory of so called totally dual integral systems. A technic that is very helpful for developing solution algorithms for integer programs.

## Math Genius: Constrained optimization with Lagrangian for optimal contract

I am trying to maximize the following w.r.t values of t:

$$pi_1S+(1-pi_1)F-pi_1p_1(t_{HG})-pi_1(1-p_1)(t_{HB})-(1-pi_1)p_1(t_{LB})-(1-pi_1)(1-p_1)(t_{LG})$$ s.t.

$$pi_1p_1(u(t_{HG}))+pi_1(1-p_1)(u(t_{HB}))+(1-pi_1)p_1(u(t_{LB}))+(1-pi_1)(1-p_1)(u(t_{LG}))-phi gequnderline{U}$$ and

$$pi_1p_1(u(t_{HG}))+pi_1(1-p_1)(u(t_{HB}))+(1-pi_1)p_1(u(t_{LB}))+(1-pi_1)(1-p_1)(u(t_{LG}))-phi geq pi_0p_0(u(t_{HG})+pi_0(1-p_0)(u(t_{HB})+(1-pi_0)p_0(u(t_{LB}))+(1-pi_0)(1-p_0)(u(t_{LG}))$$

I have written the Lagrangian as:

$$pi_1S+(1-pi_1)F-pi_1p_1(t_{HG})-pi_1(1-p_1)(t_{HB})-(1-pi_1)p_1(t_{LB})-(1-pi_1)(1-p_1)(t_{LG}) + lambda Big(pi_1p_1(u(t_{HG}))+pi_1(1-p_1)(u(t_{HB}))+(1-pi_1)p_1(u(t_{LB}))+(1-pi_1)(1-p_1)(u(t_{LG}))-phi -underline{U} Big) + mu Big(pi_1p_1(u(t_{HG}))+pi_1(1-p_1)(u(t_{HB}))+(1-pi_1)p_1(u(t_{LB}))+(1-pi_1)(1-p_1)(u(t_{LG}))-phi-pi_0p_0(u(t_{HG})+pi_0(1-p_0)(u(t_{HB})+(1-pi_0)p_0(u(t_{LB}))+(1-pi_0)(1-p_0)(u(t_{LG}))$$

And F.O.C conditions as:

$$t_{HG}: pi_1p_1 + lambdapi_1p_1u'(t_{HG})+ mupi_1p_1u'(t_{HG})-upi_0p_0u'(t_{HG})$$
$$t_{HG}: pi_1(1-p_1) + lambdapi_1(1-p_1)u'(t_{HB})+ mupi_1(1-p_1)u'(t_{HB})-upi_0(1-p_0)u'(t_{HB})$$
$$t_{LB}: (1-pi_1)p_1 + lambda(1-pi_1)p_1u'(t_{LB})+ mu(1-pi_1)p_1u'(t_{LB})-u(1-pi_0)p_0u'(t_{LB})$$
$$t_{LG}: (1-pi_1)(1-p_1) + lambda(1-pi_1)(1-p_1)u'(t_{LG})+ mu(1-pi_1)(1-p_1)u'(t_{LG})-u(1-pi_0)(1-p_0)u'(t_{LG})$$

But I don’t know the next steps and how to make this tractable. Any feedback on how to proceed would be appreciated.

Tagged :