Math Genius: Prove that \$n cdotmin{T_1,…,T_n}\$ isn’t allowable as an estimator of \$mu\$

Let’s suppose we have some electronic device which duration follows an Exponential distribution of unknown mean $$mu$$. Some research team wants to estimate $$mu$$ and uses a sample of $$n$$ devices to do it. They use the estimator $$T=ncdot min{T_1,…,T_n}$$, to be said, $$n$$ times the time in what the first device fails when they put to work all at the same time.

I need to prove that this estimator $$T$$ is not allowable. That would mean that $$T$$ isn’t asymptotically centered or that $$T$$ isn’t consistent.

I’ve found the distribution function of $$min{T_1,…,T_n}$$ (I’ll call it $$F$$):
$$F(t)=P(min{T_1,…,T_n}leq t)=1-P(min{T_1,…,T_n}> t) =$$
$$= 1-P(T_1>t,T_2>t,…,T_n>t)=1-(P(T_1>t))^n=1-e^{-nt/mu}.$$
I’m stucked here. I need to prove it’s NOT allowable.

You are on the right track but you need to keep going a bit further.

The distribution of the first order statistic $$T_{(1)} = min (T_1, T_2, ldots, T_n)$$ is indeed exponential with CDF $$Pr[T_{(1)} le t] = 1 – e^{-nt/mu}.$$ The next step is to compute the distribution of the scale transformed first order statistic: this is your actual $$T$$: $$T = n T_{(1)} = n min (T_1, T_2, ldots, T_n).$$ This is not hard to do; I leave it as an exercise to show $$T sim operatorname{Exponential}(mu)$$ with CDF $$Pr[T le t] = 1 – e^{-t/mu}.$$
What this tells us is that $$n$$ times the minimum observation is exponentially distributed with the same mean parameter, and our intuition should lead us to observe that if this is the case, this statistic is no better than simply observing one observation: in fact, because the CDF of $$T$$ is independent of $$n$$ entirely, this means its characteristics as an estimator of $$mu$$ is also independent of the sample size! So for instance, the asymptotic variance does not tend to $$0$$; we explicitly have $$operatorname{Var}[T] = mu^2$$ hence $$lim_{n to infty} operatorname{Var}[T] = mu^2 > 0.$$ This is an undesirable characteristic of an estimator for the parameter because it says that the precision of the estimate does not decrease with increasing sample size, so there is no information to be gained about the parameter by collecting more data if you use this estimator.

Stepping back, ask yourself why this happens. While it is true that the first order statistic $$T_{(1)}$$ has a decreasing variance with increasing sample size, the problem is that the rate of this decrease is not stronger than the increase in variance that occurs when we multiply $$T_{(1)}$$ by $$n$$. As you know, for a scalar constant $$c$$ and a random variable $$X$$ with finite variance, we have $$operatorname{Var}[cX] = c^2 operatorname{Var}[X].$$ This means that scale transformations of a random variable have a squaring effect on its variance. So if we must scale the first order statistic by $$n$$ in order to get an estimate, then the variance of this statistic must decrease faster than $$n^2$$ in order to compensate for the increase due to scaling it. And this does not occur; in fact, the two effects balance each other out exactly in this case.

I should point out here that “not allowable” is a bit strong. I would prefer to characterize the estimator $$T$$ to be “poor” or “undesirable.” After all, it is an estimator of $$mu$$–just not a good one.

Math Genius: Prove that \$n cdotmin{T_1,…,T_n}\$ isn’t allowable as an estimator of \$mu\$

Let’s suppose we have some electronic device which duration follows an Exponential distribution of unknown mean $$mu$$. Some research team wants to estimate $$mu$$ and uses a sample of $$n$$ devices to do it. They use the estimator $$T=ncdot min{T_1,…,T_n}$$, to be said, $$n$$ times the time in what the first device fails when they put to work all at the same time.

I need to prove that this estimator $$T$$ is not allowable. That would mean that $$T$$ isn’t asymptotically centered or that $$T$$ isn’t consistent.

I’ve found the distribution function of $$min{T_1,…,T_n}$$ (I’ll call it $$F$$):
$$F(t)=P(min{T_1,…,T_n}leq t)=1-P(min{T_1,…,T_n}> t) =$$
$$= 1-P(T_1>t,T_2>t,…,T_n>t)=1-(P(T_1>t))^n=1-e^{-nt/mu}.$$
I’m stucked here. I need to prove it’s NOT allowable.

You are on the right track but you need to keep going a bit further.

The distribution of the first order statistic $$T_{(1)} = min (T_1, T_2, ldots, T_n)$$ is indeed exponential with CDF $$Pr[T_{(1)} le t] = 1 – e^{-nt/mu}.$$ The next step is to compute the distribution of the scale transformed first order statistic: this is your actual $$T$$: $$T = n T_{(1)} = n min (T_1, T_2, ldots, T_n).$$ This is not hard to do; I leave it as an exercise to show $$T sim operatorname{Exponential}(mu)$$ with CDF $$Pr[T le t] = 1 – e^{-t/mu}.$$
What this tells us is that $$n$$ times the minimum observation is exponentially distributed with the same mean parameter, and our intuition should lead us to observe that if this is the case, this statistic is no better than simply observing one observation: in fact, because the CDF of $$T$$ is independent of $$n$$ entirely, this means its characteristics as an estimator of $$mu$$ is also independent of the sample size! So for instance, the asymptotic variance does not tend to $$0$$; we explicitly have $$operatorname{Var}[T] = mu^2$$ hence $$lim_{n to infty} operatorname{Var}[T] = mu^2 > 0.$$ This is an undesirable characteristic of an estimator for the parameter because it says that the precision of the estimate does not decrease with increasing sample size, so there is no information to be gained about the parameter by collecting more data if you use this estimator.

Stepping back, ask yourself why this happens. While it is true that the first order statistic $$T_{(1)}$$ has a decreasing variance with increasing sample size, the problem is that the rate of this decrease is not stronger than the increase in variance that occurs when we multiply $$T_{(1)}$$ by $$n$$. As you know, for a scalar constant $$c$$ and a random variable $$X$$ with finite variance, we have $$operatorname{Var}[cX] = c^2 operatorname{Var}[X].$$ This means that scale transformations of a random variable have a squaring effect on its variance. So if we must scale the first order statistic by $$n$$ in order to get an estimate, then the variance of this statistic must decrease faster than $$n^2$$ in order to compensate for the increase due to scaling it. And this does not occur; in fact, the two effects balance each other out exactly in this case.

I should point out here that “not allowable” is a bit strong. I would prefer to characterize the estimator $$T$$ to be “poor” or “undesirable.” After all, it is an estimator of $$mu$$–just not a good one.

Math Genius: Which of this two estimators of \$mu\$ is better (Exponential distribution)?

The problem goes like this:

“Suppose we have some electronic device which duration follows an Exponential distribution of an unknown mean $$mu$$. We want to estimate $$mu$$ and two teams will take care of it. Each team tests $$n$$ different devices:

• First team measures the time $$T$$ in which the first device fails, and provides the estimation $$T_1=nT$$.

• Second team measures each device’s duration and estimate $$T_2$$ as the mean of their durations.

Which team uses a better estimator of $$mu$$?”

I know that the second team is obviously using a better estimator of $$mu$$, but i’m not sure how to prove it. I’ve tried using the definition of allowable estimator, that says that every allowable estimator is asymptotically centered ($$(hat{theta}_n)$$ is asymptotically centered as a $$theta$$ estimator if $$limlimits_{n rightarrow infty}tau_n(theta)=theta$$, being $$tau_n(theta)=text{E}[hat{theta}_n(X_i)]$$).

Calculating that limit for the team two’s estimator gives me $$mu$$, because the sample mean goes to $$mu$$ when the number of devices sampled go to infinity.

My real problem is with $$T_1$$. I don’t get why it multiplies the time $$T$$ of the first devide by $$n$$, and evaluating that limit gives $$infty$$ if i’m not wrong:

$$limlimits_{n rightarrow infty}tau_n(ncdot T)=text{E}[infty cdot T]=infty.$$

Is my reasoning correct? Or what am i doing wrong?

You asked a similar question more recently, but I am going to provide a separate answer to this question since the comments attached to this question are mistaken.

What the other commenters in this question have misunderstood is that the $$n$$ devices are being tested simultaneously and not sequentially; therefore, the time to failure of the first device to fail is exponential with mean $$mu/n$$, which is what you calculated in the more recent linked question. Multiplying this by $$n$$ yields an estimator for the mean time to failure; however, as I have pointed out there, the asymptotic variance of this estimator is not a decreasing function of the sample size. This makes it an inferior estimator compared to the one used by the second team.

It is worth noting that $$lim_{n to infty} operatorname{E}[nT_{(1)}] ne operatorname{E}[infty cdot T_{(1)}].$$ This is of course an invalid interchange of limits and expectations as well as not recognizing that $$T_{(1)}$$ has a limiting expectation of $$0$$. Instead, $$operatorname{E}[n T_{(1)}] = n operatorname{E}[T_{(1)}] = n (mu/n) = mu.$$

Math Genius: Finding the gradient of the objective function in a differential equation parameter identification problem

I am dealing with a parameter identification problem, where I have a system of differential equations $$frac{dy}{dt}=mathbf{F}(t,vec{y}(t,vec{p}),vec{p}) tag1$$ where I need to find the parameters contained in the vector $$vec{p}$$, that minimise the objective function $$C=min_pleft(||vec{y}(t,vec{p})-vec{y}_m||^2_2right) tag2$$ where $$vec{y}_m$$ is some measured data.

As I am performing the optimisation using a gradient-based algorithm, I need to find the gradient of the objective function. For this as I understand I need to employ the sensitivity (or variational) equations where (for notational simplicity $$vec{cdot}$$ is ommitted): $$frac{d}{dt}frac{partial y}{partial p}=frac{partial mathbf{F}}{partial y}cdotfrac{partial y}{partial p}+frac{partial mathbf{F}}{partial p} tag3$$

This way the gradient of my objective function would be: $$vec{G}=left[frac{partial y}{partial p}right]^Tcdotleft(y-y_mright)$$

There are three questions I have:

1. Is the expression for the gradient $$G$$ in equation (4) correct?
2. How is equation (3) derived? This post seems to do it with the chain rule, however Hemker (1971) on page 75 seems to be performing a linear approximation. These seem fairly different, with different assumptions made particularly in Hemker’s work, which also seems to result to a different form of the objective function’s gradient (which I do not comprehend really well).
3. For my case I have the differential equations in the form of

$$mathbf{ M(mathit{y})mathit{y”} +Cmathit{y’}+Kmathit{y}+ A(mathit{y,y’,p})} = mathbf{0}\ Rightarrow mathbf{F} = begin{bmatrix} y’ \y” end{bmatrix}=begin{bmatrix} mathbf{0} & mathbf{I} \ mathbf{-[M(mathit{y})]^{-1}K} & mathbf{-[M(mathit{y})]^{-1}C} end{bmatrix}begin{bmatrix} y \ y’ end{bmatrix}+ \ begin{bmatrix} mathbf{0} \ mathbf{-[M(mathit{y})]^{-1}A(mathit{y,y’,p})} end{bmatrix} tag4$$

where $$mathbf{M(mathit{y})}$$ is a diagonal variable matrix, $$mathbf{A(mathit{y,y’,p})}$$ is a variable vector, and $$mathbf{C,K}$$ are constant tri-diagonal matrices. So begin{align} frac{partial mathbf{F}}{partial y} = & begin{bmatrix} mathbf{0} & mathbf{0} \ [mathbf{M(mathit{y})}]^{-1}frac{partial M}{partial y}[mathbf{M(mathit{y})}]^{-1}(mathbf{Kmathit{y}+ Cmathit{y’}})-mathbf{[M(mathit{y})]^{-1}K} & mathbf{0} end{bmatrix}+\ & begin{bmatrix} mathbf{0} & mathbf{0} \ [mathbf{M(mathit{y})}]^{-1}frac{partial M}{partial y}[mathbf{M(mathit{y})}]^{-1}text{diag}mathbf{left(A(mathit{y,y’,p})right)-[mathbf{M(mathit{y})}]^{-1}}frac{mathbf{partial A(mathit{y,y’,p})}}{partial y} & mathbf{0} end{bmatrix} end{align} tag5

However when employing equation (5) to solve equation (3) and subsequently find $$G$$, this does not give rise to the gradient of the objective function (I know since I have the correct answer which is numerically derived, and my $$C$$ is very smooth). Instead what returns an answer extremely close to the correct one (and I can’t explain that) is if I use the following:

begin{align} frac{partial mathbf{F}}{partial y} = & begin{bmatrix} mathbf{0} & mathbf{0} \ -[mathbf{M(mathit{y})}]^{-1}frac{partial M}{partial y}[mathbf{M(mathit{y})}]^{-1}(mathbf{Kmathit{y}+ Cmathit{y’}})-mathbf{[M(mathit{y})]^{-1}K} & mathbf{0} end{bmatrix}-\ & begin{bmatrix} mathbf{0} & mathbf{0} \ [mathbf{M(mathit{y})}]^{-1}frac{partial M}{partial y}[mathbf{M(mathit{y})}]^{-1}text{diag}mathbf{left(A(mathit{y,y’,p})right)} & mathbf{0} end{bmatrix} end{align} tag6

So to conclude with my 3rd question, based on the above, am I missing part of the theory here?

Math Genius: Minimum variance unbiased estimator for scale parameter of a certain gamma distribution

Let $$X_1, X_2, …, X_n$$ be a random sample from a distribution with p.d.f.,
$$f(x;theta)=theta^2xe^{-xtheta} ; 00$$ Obtain minimum variance unbiased estimator of $$theta$$ and examine whether it is attained?

MY WORK:

Using MLE i have found the estimator for $$theta=frac{2}{bar{x}}$$
Or as $$Xsim operatorname{Gamma}(2, theta)$$So
$$E(X)=2theta$$, $$E(frac{X}{2})=theta$$
so can I take $$frac {X}{2}$$ as unbiased estimator of $$theta$$.
I’m stuck and confused need some help. Thank u.

If one is familiar with the concepts of sufficiency and completeness, then this problem is not too difficult. Note that \$f(x; theta)\$ is the density of a \$Gamma(2, theta)\$ random variable. The gamma distribution falls within the class of the exponential family of distributions, which provides rich statements regarding the construction of uniformly minimum variance unbiased estimators via notions of sufficiency and completeness.

The distribution of a random sample of size \$n\$ from this distribution is
\$\$
g(x_1,ldots,x_n; theta) = theta^{2n} expBig(-theta sum_{i=1}^n x_i + sum_{i=1}^n log x_iBig)
\$\$
which, again, conforms to the exponential family class.

From this we can conclude that \$S_n = sum_{i=1}^n X_i\$ is a complete, sufficient statistic for \$theta\$. Operationally, this means that if we can find some function \$h(S_n)\$ that is unbiased for \$theta\$, then we know immediately via the Lehmann-Scheffe theorem that \$h(S_n)\$ is the unique uniformly minimum variance unbiased (UMVU) estimator.

Now, \$S_n\$ has distribution \$Gamma(2n, theta)\$ by standard properties of the gamma distribution. (This can be easily checked via the moment-generating function.)

Furthermore, straightforward calculus shows that
\$\$
mathbb{E} S_n^{-1} = int_0^infty s^{-1} frac{theta^{2n} s^{2n – 1}e^{-theta s}}{Gamma(2n)} ,mathrm{d}s = frac{theta}{2n – 1} >.
\$\$

Hence, \$h(S_n) = frac{2n-1}{S_n}\$ is unbiased for \$theta\$ and must, therefore, be the UMVU estimator.

Addendum: Using the fact that \$newcommand{e}{mathbb{E}}e S_n^{-2} = frac{theta^2}{(2n-1)(2n-2)}\$, we conclude that the \$mathbb{V}ar(h(S_n)) = frac{theta^2}{2(n-1)}\$. On the other hand, the information \$I(theta)\$ from a sample of size one is readily computed to be \$-e frac{partial^2 log f}{partial theta^2} = 2 theta^{-2}\$ and so the Cramer-Rao lower bound for a sample of size \$n\$ is
\$\$
mathrm{CRLB}(theta) = frac{1}{n I(theta)} = frac{theta^2}{2n} > .
\$\$

Hence, \$h(S_n)\$ does not achieve the bound, though it comes close, and indeed, achieves it asymptotically.

However, if we reparametrize the density by taking \$beta = theta^{-1}\$ so that
\$\$
f(x;beta) = beta^{-2} x e^{-x/beta},quad x > 0,
\$\$
then the UMVU estimator for \$beta\$ can be shown to be \$tilde{h}(S_n) = frac{S_n}{2 n}\$. (Just check that it’s unbiased!) The variance of this estimator is \$mathbb{V}ar(tilde{h}(S_n)) = frac{beta^2}{2n}\$ and this coincides with the CRLB for \$beta\$.

The point of the addendum is that the ability to achieve (or not) the CRLB depends on the particular parametrization used and even when there is a one-to-one correspondence between two unique parametrizations, an unbiased estimator for one may achieve the Cramer-Rao lower bound while the other one does not.

Math Genius: UMVUE of \$frac1theta\$ when \$X_isim f_theta(x)=frac{1}{24}x^4theta^{-5}e^{-frac{x}{theta}}\$

Let
$$f_theta(x_1,dots,x_n)=frac{1}{24^n}prod_{i=1}^nx_i^4theta^{-5n}e^{sumlimits_{i=1}^nfrac{-x_i}{theta}}hspace{0.5cm}underset{forall i=1dotsm n}{x_iinmathbb{R}^+};thetainmathbb{R^+}$$

The distribution belongs to a regular exponential family because

$$f_theta(X_1,dots,X_n)=frac{1} {24^n}prod_{i=1}^nx_i^4theta^{-5n}e^{sumlimits_{i=1}^nfrac{-x_i}{theta}}=h(X_1,dots,X_n)c(theta)e^{Q(theta)T(X_1,dots,X_n)}$$

And

$$-frac{c'(theta)}{c(theta)Q'(theta)}=-frac{-5ntheta^{-5n-1}}{-theta^{-2}theta^{-5n}}=-5ntheta$$

So $$E[T]=-5ntheta$$ where $$T=sum_{i=1}^n-Xi$$

Then $$G=-frac{T}{5n}$$ is the UMVUE for $$theta$$

How can I find the UMVUE for $$frac{1}{theta}$$

$$T(X_1,dots,X_n)=sumlimits_{i=1}^nX_isimtext{Gamma}(5n,theta)$$

Then

begin{align} Eleft[frac{1}{T}right]&=int_0^inftyfrac{1}{x}cdotfrac{x^{5n-1}e^{-frac{x}{theta}}}{Gamma(5n)theta^{5n}};dx \&=frac{1}{Gamma(5n)theta^{5n}}int_0^infty x^{5n-2}e^{-frac{x}{theta}};dx \&=frac{theta^{5n-1}}{Gamma(5n)theta^{5n}}cdotint_0^infty frac{x^{5n-2}}{theta^{5n-1}}e^{-frac{x}{theta}};dx \&=frac{1}{Gamma(5n)theta}cdotGamma(5n-1) \&=frac{1}{theta(5n-1)} end{align}

So
$$H(X_1,dots,X_n)=frac{5n-1}{T}$$

Math Genius: UMVUE of \$frac1theta\$ when \$X_isim f_theta(x)=frac{1}{24}x^4theta^{-5}e^{-frac{x}{theta}}\$

Let
$$f_theta(x_1,dots,x_n)=frac{1}{24^n}prod_{i=1}^nx_i^4theta^{-5n}e^{sumlimits_{i=1}^nfrac{-x_i}{theta}}hspace{0.5cm}underset{forall i=1dotsm n}{x_iinmathbb{R}^+};thetainmathbb{R^+}$$

The distribution belongs to a regular exponential family because

$$f_theta(X_1,dots,X_n)=frac{1} {24^n}prod_{i=1}^nx_i^4theta^{-5n}e^{sumlimits_{i=1}^nfrac{-x_i}{theta}}=h(X_1,dots,X_n)c(theta)e^{Q(theta)T(X_1,dots,X_n)}$$

And

$$-frac{c'(theta)}{c(theta)Q'(theta)}=-frac{-5ntheta^{-5n-1}}{-theta^{-2}theta^{-5n}}=-5ntheta$$

So $$E[T]=-5ntheta$$ where $$T=sum_{i=1}^n-Xi$$

Then $$G=-frac{T}{5n}$$ is the UMVUE for $$theta$$

How can I find the UMVUE for $$frac{1}{theta}$$

$$T(X_1,dots,X_n)=sumlimits_{i=1}^nX_isimtext{Gamma}(5n,theta)$$

Then

begin{align} Eleft[frac{1}{T}right]&=int_0^inftyfrac{1}{x}cdotfrac{x^{5n-1}e^{-frac{x}{theta}}}{Gamma(5n)theta^{5n}};dx \&=frac{1}{Gamma(5n)theta^{5n}}int_0^infty x^{5n-2}e^{-frac{x}{theta}};dx \&=frac{theta^{5n-1}}{Gamma(5n)theta^{5n}}cdotint_0^infty frac{x^{5n-2}}{theta^{5n-1}}e^{-frac{x}{theta}};dx \&=frac{1}{Gamma(5n)theta}cdotGamma(5n-1) \&=frac{1}{theta(5n-1)} end{align}

So
$$H(X_1,dots,X_n)=frac{5n-1}{T}$$

Math Genius: UMVUE of \$frac1theta\$ when \$X_isim f_theta(x)=frac{1}{24}x^4theta^{-5}e^{-frac{x}{theta}}\$

Let
$$f_theta(x_1,dots,x_n)=frac{1}{24^n}prod_{i=1}^nx_i^4theta^{-5n}e^{sumlimits_{i=1}^nfrac{-x_i}{theta}}hspace{0.5cm}underset{forall i=1dotsm n}{x_iinmathbb{R}^+};thetainmathbb{R^+}$$

The distribution belongs to a regular exponential family because

$$f_theta(X_1,dots,X_n)=frac{1} {24^n}prod_{i=1}^nx_i^4theta^{-5n}e^{sumlimits_{i=1}^nfrac{-x_i}{theta}}=h(X_1,dots,X_n)c(theta)e^{Q(theta)T(X_1,dots,X_n)}$$

And

$$-frac{c'(theta)}{c(theta)Q'(theta)}=-frac{-5ntheta^{-5n-1}}{-theta^{-2}theta^{-5n}}=-5ntheta$$

So $$E[T]=-5ntheta$$ where $$T=sum_{i=1}^n-Xi$$

Then $$G=-frac{T}{5n}$$ is the UMVUE for $$theta$$

How can I find the UMVUE for $$frac{1}{theta}$$

$$T(X_1,dots,X_n)=sumlimits_{i=1}^nX_isimtext{Gamma}(5n,theta)$$

Then

begin{align} Eleft[frac{1}{T}right]&=int_0^inftyfrac{1}{x}cdotfrac{x^{5n-1}e^{-frac{x}{theta}}}{Gamma(5n)theta^{5n}};dx \&=frac{1}{Gamma(5n)theta^{5n}}int_0^infty x^{5n-2}e^{-frac{x}{theta}};dx \&=frac{theta^{5n-1}}{Gamma(5n)theta^{5n}}cdotint_0^infty frac{x^{5n-2}}{theta^{5n-1}}e^{-frac{x}{theta}};dx \&=frac{1}{Gamma(5n)theta}cdotGamma(5n-1) \&=frac{1}{theta(5n-1)} end{align}

So
$$H(X_1,dots,X_n)=frac{5n-1}{T}$$

Math Genius: Optimal estimators of Gaussian under certain conditions

You can ignore this context but I think it adds a little interest to the question..

In finance pricing information is often proprietary and firms do not want other firms to know their price, but of course they want to know everyone else’s prices. It is potentially valuable information, and useful for bench marking. A compromise is often struck where by a middle man collects all the prices from all firms and then releases some anonymous statistics to all contributors which they can utilise with respect to their own prices. Some financial markets are very illiquid and pricing them is difficult so there can be a lot of variation in the results.

Question

Assume that there is an underlying random variable, $$X sim mathcal{N}(mu, sigma^2)$$, (which represents in this case the market-price of a financial instrument measured by each firm).

We are interested in finding optimal estimators, $$hat{mu}$$ and $$hat{sigma}$$, given the following statistics about samples taken from $$X$$ (i.e. trying to estimate the distribution given the anonymised data from the middle man):

• The mean of the samples from $$X = bar{x}$$
• The range of the samples from $$X = max{s_i} – min{s_i}$$
• The number or samples is $$n$$, i.e. $$s_1, … s_n$$

Thoughts

I’m going to assume $$bar{x} =hat{mu}$$ is the optimal estimator here and that it is unbiased – fairly easy to show.

With some intensive computational statistics found that for the standard normal distribution ($$Z sim mathcal{N}(0,1)$$):

$$E[max(Z) – min (Z)] = [1.69, 2.06, 2.32, 2.53, 2.70]$$
for $$n=[3,4,5,6,7]$$

I reverse engineer the estimator $$hat{sigma} = frac{max{s_i} – min{s_i}}{t_n}$$, where $$t_n$$ is the value from the above lookup table.

Is this appropriate? Is it also an unbiased estimator?

For $$mu$$, this is indeed the optimal estimator (in the UMVU sense). This is even the case if you have all the information in the sample, so it is also the case with limited information.

For $$sigma$$, this is indeed unbiased. Letting $$Z$$ be a sample of iid standard normals, we have
$$mathbb{E}[hat{sigma}] = frac{mathbb{E}[max(X) – min(X)]}{t_n} = frac{sigmamathbb{E}[max(Z)-min(Z)]}{t_n} = sigma,$$
since $$X$$ and $$sigma Z$$ have the same (joint) distribution, so $$(max(X), min(X))$$ has the same (joint) distribution as $$sigma(max(Z), min(Z)).$$

It’s also consistent, since $$text{Var}(hat{sigma})to 0,$$ using for example this: https://stats.stackexchange.com/questions/229073/variance-of-maximum-of-gaussian-random-variables.

Math Genius: Estimate size of population using hyper geometric distribution and maximum likelihood estimator

I saw the following in my notes, but I can’t quite remember why the argument holds… could someone help me please?

Suppose there are N number of fishes in a lake and we want to estimate N. We catch m=1000 fishes, mark them and release them back.

Then we catch n=1000 fishes and count X marked fishes. We know that X~hypergeo(N,m,n) and $$Pr(X=x|N)=frac{{m choose x }{N-m choose n-x}}{N choose n}$$.

Let $$N’=arg max_N Pr(X=x|N)$$. To find $$N’$$, we can maximise

$$frac{Pr(X=x|N)}{Pr(X=x|N-1)}=frac{N(N-n-m)+mn }{N(N-n-m)+Nx }$$ $$(star)$$

Then the argument said $$(star)$$ is maximised when it is equal to 1, so we have $$mn=N’x$$. I don’t know why $$(star)$$ is maximised when it a equal to 1?

The argument as you present it is wrong. $$(star)$$ is not maximized when it is equal to $$1$$; rather, $$mathsf Pr(X=xmid N)$$ is maximized if the ratio $$(star)$$ is equal to $$1$$. This is because $$mathsf Pr(X=xmid N)$$ is unimodal, and as long as $$(star)$$ is greater than $$1$$, $$mathsf Pr(X=xmid N)$$ is increasing, and when $$(star)$$ becomes less than $$1$$, $$mathsf Pr(X=xmid N)$$ starts decreasing.