## Math Genius: Prove that \$n cdotmin{T_1,…,T_n}\$ isn’t allowable as an estimator of \$mu\$

Let’s suppose we have some electronic device which duration follows an Exponential distribution of unknown mean $$mu$$. Some research team wants to estimate $$mu$$ and uses a sample of $$n$$ devices to do it. They use the estimator $$T=ncdot min{T_1,…,T_n}$$, to be said, $$n$$ times the time in what the first device fails when they put to work all at the same time.

I need to prove that this estimator $$T$$ is not allowable. That would mean that $$T$$ isn’t asymptotically centered or that $$T$$ isn’t consistent.

I’ve found the distribution function of $$min{T_1,…,T_n}$$ (I’ll call it $$F$$):
$$F(t)=P(min{T_1,…,T_n}leq t)=1-P(min{T_1,…,T_n}> t) =$$
$$= 1-P(T_1>t,T_2>t,…,T_n>t)=1-(P(T_1>t))^n=1-e^{-nt/mu}.$$
I’m stucked here. I need to prove it’s NOT allowable.

You are on the right track but you need to keep going a bit further.

The distribution of the first order statistic $$T_{(1)} = min (T_1, T_2, ldots, T_n)$$ is indeed exponential with CDF $$Pr[T_{(1)} le t] = 1 – e^{-nt/mu}.$$ The next step is to compute the distribution of the scale transformed first order statistic: this is your actual $$T$$: $$T = n T_{(1)} = n min (T_1, T_2, ldots, T_n).$$ This is not hard to do; I leave it as an exercise to show $$T sim operatorname{Exponential}(mu)$$ with CDF $$Pr[T le t] = 1 – e^{-t/mu}.$$
What this tells us is that $$n$$ times the minimum observation is exponentially distributed with the same mean parameter, and our intuition should lead us to observe that if this is the case, this statistic is no better than simply observing one observation: in fact, because the CDF of $$T$$ is independent of $$n$$ entirely, this means its characteristics as an estimator of $$mu$$ is also independent of the sample size! So for instance, the asymptotic variance does not tend to $$0$$; we explicitly have $$operatorname{Var}[T] = mu^2$$ hence $$lim_{n to infty} operatorname{Var}[T] = mu^2 > 0.$$ This is an undesirable characteristic of an estimator for the parameter because it says that the precision of the estimate does not decrease with increasing sample size, so there is no information to be gained about the parameter by collecting more data if you use this estimator.

Stepping back, ask yourself why this happens. While it is true that the first order statistic $$T_{(1)}$$ has a decreasing variance with increasing sample size, the problem is that the rate of this decrease is not stronger than the increase in variance that occurs when we multiply $$T_{(1)}$$ by $$n$$. As you know, for a scalar constant $$c$$ and a random variable $$X$$ with finite variance, we have $$operatorname{Var}[cX] = c^2 operatorname{Var}[X].$$ This means that scale transformations of a random variable have a squaring effect on its variance. So if we must scale the first order statistic by $$n$$ in order to get an estimate, then the variance of this statistic must decrease faster than $$n^2$$ in order to compensate for the increase due to scaling it. And this does not occur; in fact, the two effects balance each other out exactly in this case.

I should point out here that “not allowable” is a bit strong. I would prefer to characterize the estimator $$T$$ to be “poor” or “undesirable.” After all, it is an estimator of $$mu$$–just not a good one.

## Math Genius: Prove that \$n cdotmin{T_1,…,T_n}\$ isn’t allowable as an estimator of \$mu\$

Let’s suppose we have some electronic device which duration follows an Exponential distribution of unknown mean $$mu$$. Some research team wants to estimate $$mu$$ and uses a sample of $$n$$ devices to do it. They use the estimator $$T=ncdot min{T_1,…,T_n}$$, to be said, $$n$$ times the time in what the first device fails when they put to work all at the same time.

I need to prove that this estimator $$T$$ is not allowable. That would mean that $$T$$ isn’t asymptotically centered or that $$T$$ isn’t consistent.

I’ve found the distribution function of $$min{T_1,…,T_n}$$ (I’ll call it $$F$$):
$$F(t)=P(min{T_1,…,T_n}leq t)=1-P(min{T_1,…,T_n}> t) =$$
$$= 1-P(T_1>t,T_2>t,…,T_n>t)=1-(P(T_1>t))^n=1-e^{-nt/mu}.$$
I’m stucked here. I need to prove it’s NOT allowable.

You are on the right track but you need to keep going a bit further.

The distribution of the first order statistic $$T_{(1)} = min (T_1, T_2, ldots, T_n)$$ is indeed exponential with CDF $$Pr[T_{(1)} le t] = 1 – e^{-nt/mu}.$$ The next step is to compute the distribution of the scale transformed first order statistic: this is your actual $$T$$: $$T = n T_{(1)} = n min (T_1, T_2, ldots, T_n).$$ This is not hard to do; I leave it as an exercise to show $$T sim operatorname{Exponential}(mu)$$ with CDF $$Pr[T le t] = 1 – e^{-t/mu}.$$
What this tells us is that $$n$$ times the minimum observation is exponentially distributed with the same mean parameter, and our intuition should lead us to observe that if this is the case, this statistic is no better than simply observing one observation: in fact, because the CDF of $$T$$ is independent of $$n$$ entirely, this means its characteristics as an estimator of $$mu$$ is also independent of the sample size! So for instance, the asymptotic variance does not tend to $$0$$; we explicitly have $$operatorname{Var}[T] = mu^2$$ hence $$lim_{n to infty} operatorname{Var}[T] = mu^2 > 0.$$ This is an undesirable characteristic of an estimator for the parameter because it says that the precision of the estimate does not decrease with increasing sample size, so there is no information to be gained about the parameter by collecting more data if you use this estimator.

Stepping back, ask yourself why this happens. While it is true that the first order statistic $$T_{(1)}$$ has a decreasing variance with increasing sample size, the problem is that the rate of this decrease is not stronger than the increase in variance that occurs when we multiply $$T_{(1)}$$ by $$n$$. As you know, for a scalar constant $$c$$ and a random variable $$X$$ with finite variance, we have $$operatorname{Var}[cX] = c^2 operatorname{Var}[X].$$ This means that scale transformations of a random variable have a squaring effect on its variance. So if we must scale the first order statistic by $$n$$ in order to get an estimate, then the variance of this statistic must decrease faster than $$n^2$$ in order to compensate for the increase due to scaling it. And this does not occur; in fact, the two effects balance each other out exactly in this case.

I should point out here that “not allowable” is a bit strong. I would prefer to characterize the estimator $$T$$ to be “poor” or “undesirable.” After all, it is an estimator of $$mu$$–just not a good one.

## Math Genius: Prove that \$n cdotmin{T_1,…,T_n}\$ isn’t allowable as an estimator of \$mu\$

Let’s suppose we have some electronic device which duration follows an Exponential distribution of unknown mean $$mu$$. Some research team wants to estimate $$mu$$ and uses a sample of $$n$$ devices to do it. They use the estimator $$T=ncdot min{T_1,…,T_n}$$, to be said, $$n$$ times the time in what the first device fails when they put to work all at the same time.

I need to prove that this estimator $$T$$ is not allowable. That would mean that $$T$$ isn’t asymptotically centered or that $$T$$ isn’t consistent.

I’ve found the distribution function of $$min{T_1,…,T_n}$$ (I’ll call it $$F$$):
$$F(t)=P(min{T_1,…,T_n}leq t)=1-P(min{T_1,…,T_n}> t) =$$
$$= 1-P(T_1>t,T_2>t,…,T_n>t)=1-(P(T_1>t))^n=1-e^{-nt/mu}.$$
I’m stucked here. I need to prove it’s NOT allowable.

You are on the right track but you need to keep going a bit further.

The distribution of the first order statistic $$T_{(1)} = min (T_1, T_2, ldots, T_n)$$ is indeed exponential with CDF $$Pr[T_{(1)} le t] = 1 – e^{-nt/mu}.$$ The next step is to compute the distribution of the scale transformed first order statistic: this is your actual $$T$$: $$T = n T_{(1)} = n min (T_1, T_2, ldots, T_n).$$ This is not hard to do; I leave it as an exercise to show $$T sim operatorname{Exponential}(mu)$$ with CDF $$Pr[T le t] = 1 – e^{-t/mu}.$$
What this tells us is that $$n$$ times the minimum observation is exponentially distributed with the same mean parameter, and our intuition should lead us to observe that if this is the case, this statistic is no better than simply observing one observation: in fact, because the CDF of $$T$$ is independent of $$n$$ entirely, this means its characteristics as an estimator of $$mu$$ is also independent of the sample size! So for instance, the asymptotic variance does not tend to $$0$$; we explicitly have $$operatorname{Var}[T] = mu^2$$ hence $$lim_{n to infty} operatorname{Var}[T] = mu^2 > 0.$$ This is an undesirable characteristic of an estimator for the parameter because it says that the precision of the estimate does not decrease with increasing sample size, so there is no information to be gained about the parameter by collecting more data if you use this estimator.

Stepping back, ask yourself why this happens. While it is true that the first order statistic $$T_{(1)}$$ has a decreasing variance with increasing sample size, the problem is that the rate of this decrease is not stronger than the increase in variance that occurs when we multiply $$T_{(1)}$$ by $$n$$. As you know, for a scalar constant $$c$$ and a random variable $$X$$ with finite variance, we have $$operatorname{Var}[cX] = c^2 operatorname{Var}[X].$$ This means that scale transformations of a random variable have a squaring effect on its variance. So if we must scale the first order statistic by $$n$$ in order to get an estimate, then the variance of this statistic must decrease faster than $$n^2$$ in order to compensate for the increase due to scaling it. And this does not occur; in fact, the two effects balance each other out exactly in this case.

I should point out here that “not allowable” is a bit strong. I would prefer to characterize the estimator $$T$$ to be “poor” or “undesirable.” After all, it is an estimator of $$mu$$–just not a good one.

## Math Genius: how do i find the variance of an estimator?

If the Estimator was simply the sample mean $$s=frac{sum{x}}{n}$$ taken from a binomial distribution (a random example) how would i calculate the variance of this? I am trying to use the difference between the expectations squared but im not sure what the expectation of the infinite sum would be.

If $$X_1, X_2, dots, X_n$$ is a random sample from a population with mean $$mu$$ and
variance $$sigma^2,$$ let $$T = sum_{i=1}^n X_i.$$

Then

$$E(T) = Eleft(sum_{i=1}^n X_iright) = sum_{i=1}^n E(X_i) = sum_{i=1}^n mu = nmu.$$

Also, elements of a random sample are independent, so we have

$$V(T) = Vleft(sum_{i=1}^n X_iright) = sum_{i=1}^n V(X_i) = sum_{i=1}^n sigma^2 = nsigma^2.$$

Also, with $$bar X = frac{1}{n}sum_{i=1}^n X_i = frac{1}{n}T,$$
so that

$$E(bar X) = Eleft(frac{1}{n}Tright) = frac{1}{n}E(T) = frac{1}{n}nmu = mu.$$

Thus. the expected value of the sample mean $$bar X$$ is the population mean $$mu.$$ (We say that $$bar X$$ is an unbiased estimator of $$mu.)$$

Moreover,

$$V(bar X) = Vleft(frac{1}{n}Tright) = left(frac{1}{n}right)^2V(T) = left(frac{1}{n}right)^2nsigma^2 = frac{1}{n}sigma^2 = sigma^2/n.$$

Notes: (1) In the first displayed equation the expected value of a sum of random variables is the sum of the expected values, whether nor not the random variables are independent.

(2) However, the variance of the sum of random variables is not necessarily equal to the sum of the variances, unless the random variables are independent.

[As a trivial case, if all $$n ge 2$$ of the $$X_i = X,$$ then the $$X_i$$ are not independent
and we have $$Vleft(sum_{i=1}^n X_iright) = V(nX) = n^2V(X) ne nV(X).$$ As another example, if $$X_1 = -X_2$$ with $$V(X_1)=V(X_2) > 0,$$ then $$V(X_1+X_2) = V(0) = 0 ne V(X_1)+V(X_2).]$$

(3) For the standard deviation of the mean of a random sample, we can take square roots to get, $$SD(bar X) = sigma/sqrt{n}.$$ (Sometimes this is called the
‘standard error’ of $$bar X.)$$

Tagged : / / /

## Math Genius: Constrained form for the M-estimator

I am trying to self-study some stuff in empirical process and find this penalized M-estimator question.

Question:

Let observation matrix $$Y$$ defined as $$Y=theta^{*}+epsilon$$ where $$epsilon$$ is a $$n times n$$ matrix of i.i.d $$N(0, sigma)$$ entries. Let $$|A|_{1}$$ denote the nuclear norm of the matrix $$A.$$ (the sum of singular values of A). $$operatorname{Let} L^{*}=left|theta^{*}right|_{1}$$ and $$L>0$$ be a tuning parameter s.t. $$L geq L^{*}.$$ Let
$$hat{theta}=underset{theta:|theta|_{1} leq L}{operatorname{argmin}}|Y-theta|^{2}$$
Show that
$$mathbb{E} frac{1}{n^{2}}left|hat{theta}-theta^{*}right|^{2} leq C sigma frac{L}{n^{3 / 2}} sqrt{log n}$$

I got a couple of questions for this:

1. Is there specific name for this estimator?

2. I am trying to figure out the upper bound for $$mathbb{E} frac{1}{n^{2}}left|hat{theta}-theta^{*}right|^{2}$$.

My sketch of proof:
Using the empirical process notation, I rewrite the $$hat{theta}$$ as
$$hat{theta}=underset{theta:|theta|_{1} leq L}{operatorname{argmax}}left(2leftlangleepsilon, theta- theta^{*}rightrangle-left| theta- theta^{*}right|^{2}right)=underset{theta in Theta}{operatorname{argmax}} M_{n}(theta),$$
where $$Theta:=left{theta in mathbb{R}^{n times n}:|theta|_{1} leq Lright}$$ and
$$M_{n}(theta):=2leftlangleepsilon, theta- theta^{*}rightrangle-left| theta- theta^{*}right|^{2}.$$
Also I let $$M(theta):=-left| theta- theta^{*}right|^{2} quad$$ and $$quad dleft(theta, theta^{*}right):=left| theta- theta^{*}right|$$, then by M-estimator rate theorem, I have the following:
$$mathbb{E} sup _{theta:|theta|_{1} leq L, dleft(theta, theta^{*}right) leq u}left(M_{n}-Mright)left(theta-theta^{*}right) leq 2 mathbb{E} quad sup _{theta:|theta|_{1} leq L,left| theta- theta^{*}right| leq u}leftlangleepsilon, theta- theta^{*}rightrangle=mathbb{E} sup _{v in V}langleepsilon, vrangle$$
where $$V$$ consists of all vectors $$v in mathbb{R}^{n}$$ with $$|v| leq u$$ and which satisfy $$v=left(theta-theta^{*}right)$$ for some $$theta$$ with $$|theta|_{1} leq L$$ Then I try to use Dudley’s entropy bound to control the expected supremum:

$$mathbb{E} sup _{v in V}langleepsilon, vrangle leq C int_{0}^{operatorname{diam}(V) / 2} sqrt{log M(epsilon, V)} d epsilon$$
where I define $$M(epsilon, V)$$ and $$operatorname{diam}(V)$$ are the packing number and diameter in the usual Euclidean metric on $$mathbb{R}^{n}$$.

Then I got stuck, I am trying but I cannot bound $$M(epsilon, V)$$. There is no direct result I can use for this packing number. How can find this packing number and is my argument until now is reasonable? Many thanks.