## Math Genius: Distribution of maximum of a moving average sequence

Let $$Z_n sim WN(0,sigma^2)$$ and $$a in mathbb{R}$$, $$X_t = Z_t + aZ_{t-1} qquad t = 0, pm1, pm 2,…$$ defines a $$MA(1)$$ sequence. I need to prove that this sequence has an independent extremal index. Therefore I need to prove that as $$n rightarrow infty$$, for each $$tau > 0$$

$$P(n(1-F(M_n)geq tau) rightarrow e^{-tau}$$

with $$M_n = max_{forall t} (X_t)$$ and $$F$$ the continuous cdf of $$X_t$$. I am trying to make a start with trying to do something with $$M_n = max(X_t + aX_{t-1})$$, getting to know the distribution…
I already have the proof for $$X_t$$ is an iid sequence.

## Math Genius: Expected value of sum of different geometric random variables

$$n$$ players play a game where they flip a fair coin simultaneously. If a player has a different result than the others, that player is out, and then the remaining $$n – 1$$ players continue until there are two players left and they are the winners. For example, for $$n=3$$, a result $$(H,T,T)$$ makes the first player lose and the other two to win, and $$(H,H,H)$$ will make them toss again.

I’ll define the variable: $$Y =text{number of rounds until there are two players left out of } n$$

I’m looking for $$E(Y), VAR(Y)$$. What I did was:

Define a random variable $$X_i = text{number of rounds until one player out of } i text{ is out}$$

so is follows that: $$X_i sim{mathrm{Geo}(2cdotfrac{i}{2^{i}} = frac{i}{2^{i-1}})}$$ , since we have to choose a player and a value for the coin
$$Y =sum_{i=3}^{n}X_i$$
$$E(Y) = E(sum_{i=3}^{n}X_i)=sum_{i=3}^{n}E(X_i)=sum_{i=3}^{n}frac{2^{i}}{i}$$
$$VAR(Y) = sum_{i=3}^{n}VAR(X_i) = sum_{i=3}^{n} dfrac{dfrac{2^{i-1}-i}{2^{i-1}}}{dfrac{i^2}{2^{2(i-1)}}} = sum_{i=3}^{n} dfrac{2^{i-1}(2^{i-1}-i)}{i^2}$$

Is there a closed form solution to this problem?

## Math Genius: Find \$ P(Z>X+Y)\$ where \$X,Y,Z sim U(0,1)\$ independently

I’m trying to follow a line in a derivation for \$P(Z>X+Y)\$ where \$X,Y,Z\$ are independent continuous random variables distributed uniformly on \$(0,1)\$.

I’ve already derived the pdf of \$X+Y\$ using the convolution theorem, but there’s a line in the answer that says:

\$P(Z>X+Y) = mathbb{E}[ P(Z>X+Y | X+Y ) ]\$ where \$mathbb{E}\$ is the expectation.

I’m not familiar with this result. Could anyone give a pointer to a similar result if one exists?

Thanks.

$$mathbb{P}(Z>X+Y)=mathbb{E}[mathbb{1}(Z>X+Y)]=mathbb{E}[mathbb{E}[mathbb{1}(Z>X+Y)|X+Y]]=mathbb{E}[mathbb{P}(Z>X+Y|X+Y)],$$
where second equality is the following property of conditional expectation:
$$mathbb{E}[mathbb{E}[X|Y]]=mathbb{E}[X]$$
Intuitively, now that you know distribution of $$X+Y$$, you just need to “range”$$^1$$ through the values of $$X+Y$$, and find the probability of $$Z>X+Y$$ for each such value. This is exactly the expectation of the probability.

$$^1$$integrate against the density, i.e. $$int_0^2mathbb{P}(Z>v)f_{X+Y}(v);dv$$

equation that is puzzling you, but I think the geometrical method
described below for solving the problem that may give you a different
insight into the calculation of the desired probability \$P{Z > X+Y}\$.

The random point \$(X,Y,Z)\$ is uniformly distributed in the interior of the unit cube with diagonally opposite vertices \$(0,0,0)\$ and \$(1,1,1)\$. The cube has unit volume and so the probability that \$(X,Y,Z)\$ is in some region is just the volume of that region. Thus, \$P{Z > X+Y}\$ is the volume of the tetrahedron with vertices \$(0,0,0)\$, \$(1,0,1)\$, \$(0,1,1)\$ and \$(0,0,1)\$. If we think of this
as an inverted pyramid whose base is the right triangle with vertices
\$(1,0,1)\$, \$(0,1,1)\$ and \$(0,0,1)\$ and apex \$(0,0,0)\$ is at
altitude \$1\$ “above” the base,
then since the base has area \$frac{1}{2}\$, we get
the volume as
\$\$P{Z > X+Y} = frac{1}{3}times (text{area of base})times(text{altitude})
= frac{1}{3}times frac{{3}}{2}times1 = frac{1}{6}.\$\$
Of course, if you have already computed the density of \$X+Y\$, then it is
straightforward to use the result given by Artiom Fiodorov to get
\$\$P{Z > X+Y}= int_0^2{P}(Z>v)f_{X+Y}(v);dv
= int_0^1(1-v)cdot v;dv =
left.frac{v^2}{2}-frac{v^3}{3}right|_0^1 = frac{1}{6}.\$\$

Replacing the question in a larger context might help. Here is a result:

For every event \$A\$ in \$(Omega,mathcal F,mathbb P)\$ and every sigma-algebra \$mathcal Gsubseteqmathcal F\$, \$mathbb P(A)=mathbb E(mathbb P(Amid mathcal G))\$.

To see this, recall that \$U=mathbb P(Amid mathcal G)\$ is the unique (up to null events) random variable such that \$mathbb E(U;B)=mathbb P(Acap B)\$ for every \$B\$ in \$mathcal G\$. In particular, \$B=Omega\$ yields \$mathbb E(U)=mathbb P(A)\$, as claimed above.

In your setting, \$A=[Zgt X+Y]\$ and \$mathcal G\$ is the sigma-algera generated by the random variable \$X+Y\$ hence \$mathbb P( mid mathcal G)=mathbb P( mid X+Y)\$ by definition.

A partial justification can be found in the Wikipedia entry on the Law of Total Probability.

I think your question best understood using two discrete random variables. Suppose you have two random variables \$X\$ and \$Y\$ taking values \$0,1,2,ldots,infty\$. Now you are asked to compute the probability of the event \$A = X > Y\$.

So,
\$\$
begin{eqnarray}
P(A) &=& P(X>Y)\
end{eqnarray}
\$\$
Here both \$X\$ and \$Y\$ are random. To compute this probability we need the notion of conditional probability. Here it is:
\$\$
P(A cup B) = P(A|B) times P(B)
\$\$

Now, come to the original problem. We first fix the value of any one random variable, say, \$Y = y\$. Clearly, \$y\$ is any value from \$0,1,2,ldots,infty\$, but \$y\$ can’t take these values simultaneously. Now we compute \$P(X > y|Y = y)\$ and \$P(Y = y)\$.

Hence \$P(X > Y)\$ is nothing but \$P(X > y|Y = y) times P(Y = y)\$. But we have probabilities of so many events like this for each and every possible value of \$y\$, again each of these events are mutually exclusive, because occurrence of any one, say, \$y = 1\$ prevents the occurrence of others i.e. \$y = i, i neq 1\$. Therefore, to get the required probability, we need to sum up the probabilities for each of the m.e. events. Thus finally we get,
\$\$
P(X > Y) = sum_{y = 0}^{infty} left[P(X > y| Y = y) times P(Y = y)right]
\$\$

If you are familiar with the basic definition of expectation of random variable, then previous expression is actually,
\$\$
begin{eqnarray}
P(X > Y) &=& sum_{y = 0}^{infty} left[value times text{corresponding probability}right]\
P(X > Y) &=& Eleft[P(X > y| Y = y)right]
end{eqnarray}
\$\$

Now, to make this result suitable for continuous variable, just replace the sum by integration w.r.t \$y; (0 leq y < infty)\$ and \$P(Y = y)\$ by \$f_Y(y)\$ i.e. density function of \$Y\$ at the point \$y\$.

I don’t know if this helps since Dilip has given the answer, but the distribution of X+Y is triangular on [0,2] (isosceles with peak at X+Y = 1). So P(Z>X+Y) is the probability that a uniform on [0,1] is larger than the triangular random variable on [0,2]. If X+Y>1 then Z cannot be >X+Y and the probability that X+Y is greater than 1 is 1/2. Now this is where taking the expectation fo the conditional probability helps in my proof.
P{Z>X+Y) =E[P(Z>X+Y|X+Y)]= ∫u P(Z>u|X+Y=u)du =∫u P(Z>u)du where u is integrated from 0 to 1. The condition X+Y=u gets dropped because Z is independent of X+Y. P(Z>u)=1-u for 0<=u<=1.
hence P(Z>X+Y) =∫u(1-u)du = 1/6. Just as Dilip showed.

## Math Genius: Convolution of Mixed Variables over Unique Domains

Question: I have two independent random variables (say $$X$$ and $$Y$$) such that $$X sim U[0,1]$$ and $$Y sim$$ Exp$$(1)$$, and I want to find the PDF of $$Z=X+Y$$.

My attempt:
I know $$f_X(x)=1$$ for $$x in[0,1]$$, and $$f_Y(y)=e^{-y}$$ for $$y in [0, infty)$$.

I also know that, due to their independence, $$f_Z(z)=(f_X * f_Y)(z)$$ where $$(f_X * f_Y)(z)$$ is the convolution of $$f_X$$ and $$f_Y$$.

Furthermore, $$(f_X * f_Y)(z)=int^{infty}_{-infty} f_X(z-y)f_Y(y) dy = int^{infty}_{-infty} f_Y(z-x)f_X(x) dx$$.

However, I am unsure of a few things:

• Can I use a convolution approach even though $$f_X$$ and $$f_Y$$ are not defined for all real numbers?
• If I can, how would I determine the bounds of the integral given $$f_X$$ and $$f_Y$$ are defined for different subsets of the real numbers ($$x in[0,1]$$ and $$y in [0, infty)$$ respectively)?

Context: Ultimately, I need to compute $$P(Z>z)$$ for two different cases (when $$zin[0,1]$$ and when $$z>1$$), so I planned on integrating $$f_Z(z)$$ to get the CDF for $$Z$$.

Any help would be greatly appreciated.

When you say $$f_X(x)=1$$ for $$0 what you really mean is $$f_X(x)=1$$ for $$0 an d $$f(x)=0$$ for all other $$x$$. All density functions are defined on the entire real line. So there is no problem in using the convolution formula.

In this case $$(f_X*f_Y)(z)=int_{-infty}^{infty} f_X(z-y)f_Y(y) dy$$. [This is the general formula for convolution]. Let $$z >0$$. Note that $$f_Y(y)=0$$ if $$y <0$$ and $$f_X(z-y)=0$$ if $$z-y notin (0,1)$$ i.e., if $$y notin (z-1,z)$$. Hence integration is over all positive $$y$$ satisfying $$z-1. In order to carry out this integration you have to consider two cases: $$z >1$$ and $$z <1$$. In the first case the integration is from $$z-1$$ to $$z$$. In the second case it is from $$0$$ to $$z$$.

## Math Genius: How to add skew to an normal CDF curve

I know that the normal CDF is defined as

$$frac{1}{2} left[1 + text{erf}left( frac{x}{sqrt{2}} right) right]$$

What I want is to define a similar curve that approaches the upper asymptote slower than it moves away from the lower asymptote, something that I can write in Python. I’ve heard of the skew normal distribution (https://en.wikipedia.org/wiki/Skew_normal_distribution), but this has a very complicated CDF function. I’m looking to modify the normal CDF equation to give me this type of shape. One thing I thought of was taking the square root of x:

$$frac{1}{2} left[1 + text{erf}left( frac{sqrt{x}}{sqrt{2}} right) right]$$

This way, the erf will take much longer to reach the asymptote, and in order to get to higher values, it will take a much larger $$x$$. Can anybody tell me if my method is correct/a different method that works easier?

## Math Genius: Conditional and joint distribution of the sum of exponential RVs

Let $$X_1,X_2,…,X_n$$ be i.i.d. $$Exp(lambda)$$ random variables and $$Y_k =sum^{k}_{i=1}X_i$$, $$k = 1,2,…,n$$.

a) Find the joint PDF of $$Y_1,…,Y_n$$.

b) Find the conditional PDF of $$Y_k$$ conditioned on $$Y_1,….,Y_{k−1}$$, for $$k = 2,3,…,n$$.

c) Show that $$Y_1,…,Y_k$$ conditioned on $$Y_{k+1},…,Y_n$$ is uniformly distributed over a subset in $$Bbb{R}^k$$, for $$k = 1,2,…,n−1$$. Find this subset.

My attempt:

For $$lambda_i = lambda$$, $$sum^{n}_{i=1}X_i sim Erlang(n,lambda)$$, thus $$Y_ksim Erlang(k,lambda)$$

From here I need to find the CDF first to find the PDF. But I don’t understand how.

Begin here:

Since for all $$2 leq kleq n$$ we have $$X_k=Y_k-Y_{k-1}$$, and the $$(X_k)$$ are iid expnentially distributed with pdf $$f_{small X}(x)=lambdaexp(-lambda x)cdotpmathbf 1_{0leq x}$$ … therefore… begin{align}f_{small Y_1,Y_2,ldots,Y_n}(y_1,y_2,ldots, y_n)&=f_{small X_1,X_2,ldots,X_n}(y_1,y_2{-}y_1,ldots,y_n{-}y_{n-1})\[1ex]&= f_{small X}(y_1)prod_{k=2}^nf_{small X}(y_k-y_{k-1})\[2ex]&=lambda^nexpleft(-lambdaleft(y_1+sum_{k=2}^n(y_k-y_{k-1})right)right)cdotmathbf 1_{0leq y_1leq y_2leqldotsleq y_n}\[2ex]&=phantom{lambda^nexp(-lambda y_n)cdotmathbf 1_{0leq y_1leq y_2leqldotsleq y_n}}end{align}

## Math Genius: Finding the Joint Distribution of Two Normally Distributed Random Variables

Question: Suppose $$X_1$$, $$X_2$$ and $$X_3$$ are independent random variables such that $$X_1 sim N(0,1)$$, $$X_2 sim N(1,4)$$ and $$X_3 sim N(-1,2)$$. Let $$Y_1=X_1+X_3$$ and $$Y_2=X_1+X_2-2X_3$$. Give the joint distribution of $$Y_1$$ and $$Y_2$$.

My attempt: So far I have calculated that $$Y_1 sim N(-1, 3)$$ and $$Y_2 sim N(3, 13)$$ by use of the fact the independence of the $$X_i$$ implies that any linear combination of them is also a normally distributed random variable. However, I am unsure how to find their joint distribution.

Theory: Considering the transformation $$Z=mu+BX$$ where:

• B is some $$m times n$$ matrix
• $$X$$ is a vector of iid standard normal variables
• $$Z$$ has expectation vector $$mu$$ and covariance matrix $$Sigma = BB^T$$

Then we have $$Z sim N(mu, Sigma)$$.

Application of theory: I presume in my case I would let $$Y$$ denote the column vector $$[Y_1, Y_2]^T$$. Then, from my above working, the expectation vector would be $$mu = [-1, 3]^T$$. However, I am not sure how to go from here to determine what $$X$$ would be or how to calculate the covariance matrix $$Sigma$$.

## Math Genius: Coin toss problem

A fair coin is repeatedly tossed until a head is obtained. Calculate the probability that this experiment will not end after the first $$6194$$ attempts, given that it has not ended after the first $$6192$$ attempts.

I know we can solve this using geometric distribution, but I’m having some trouble applying it correctly. Do I need to find $$P(X > r)$$, for $$r = 6194$$?

We know that the experiment has not ended after $$6192$$ tosses, so the probability we’re looking for must be multiplied by $$(dfrac{1}{2})^{6192}.$$ Where do we go from here? I’m not sure what to do on the $$6193^{rd}$$ toss. Can someone please explain how to solve this? Thank you for your time.

When given that the experiment does not end until after 6192 attempts, then the only way it will end after 6194 attempts is that it does not end on either the 6193rd or 6194th attempt.

Since these two trials are independent of all the prior trials, their successes do not influence the results of these two trials.

So the (conditional) probability we seek is merely the probability these two trials fail.

$$mathsf P(X>6194mid X>6192)=left({tfrac 1 2}right)^2$$

Alternatively, since you have already calculated $$mathsf P(X>6912)$$, then you can do the same for $$mathsf P(X>6194)$$.

begin{align}mathsf P(X>6194mid X>6192)&=dfrac{mathsf P(X>6194cap X>6192)}{mathsf P(X>6192)}\[2ex]&=dfrac{mathsf P(X>6194)}{mathsf P(X>6192)}&&{tiny{X>a+2}cap{X>a}iff {X>a+2}}\[2ex]&=dfrac{mathsf 1/2^{6194}}{1/2^{6192}}end{align}

Tagged : /

# Questions

### 1. Explaining the derivation of the hazard function

A bit of context

The hazard function is often found stated in brevity as:

$$h(t)=frac{f(t)}{S(t)}$$

where $$f(cdot)$$ is the probability density function, and $$S(cdot)$$ is the survival function. Throughout this question I will be referring the descriptions given by Rodríguez and Tian.

Traditionally the survival and hazard functions come into play when the random variable $$T$$ is non-negative and continuous. In this sense, at least the concept of the survival function is remarkably straight forward being the probability that $$T$$ is greater than $$t$$.

$$S(t)= 1-F(t) = P(T>t)$$

This is especially intuitive when put in context, e.g. the time following diagnosis of a disease until death.

From the definition of the hazard function above, it is clear that it is not a probability distribution as it allows for values greater than one.

My confusion comes in at Rodríguez‘s definition:

$$h(t) = limlimits_{dtrightarrow0}frac{P(tleq T

This to me, really only reads in a manner that makes sense in context, e.g. the numerator being the probability that the diagnosed person dies in some increment of time ($$dt$$) following some passage of time $$t$$, given that they have lived at least so long as the passage of time $$t$$ (or simpler, if it has been $$t$$ time since diagnosis, the probability that you’ll die within the next $$dt$$ time).

Confusion starts here:

Prior to the definition of equation (7.3) he states:

“The conditional probability in the numerator may be written as the ratio of the joint probability that $$T$$ is in the interval $$[t,t+dt)$$ and $$Tgeq t$$ (which is, of course, the same as the probability that $$t$$ is in the interval), to the probability of the condition $$Tgeq t$$. The former may be written as $$f(t)dt$$ for small $$dt$$, while the latter is $$S(t)$$ by definition”

My confusion comes from the following:

1. in my exposure, joint distortions come from two random variables, not one as is the case here, $$T$$.

If I just accept that can be the case, I can then use a rule from conditional probability $$P(Acap B)=P(A|B)P(B)$$ to restructure the numerator:

$$P(t leq T < t+dt | T geq t) = frac{P(t leq T < t+dt cap Tgeq t)}{P(Tgeq t)}$$

then substitute back in to get:
$$h(t) = limlimits_{dtrightarrow0} = frac{P(t leq T < t+dt cap Tgeq t)}{P(Tgeq t)dt}$$

1. it is stated matter of fact that P(t leq T < t+dt cap Tgeq t) may be written as $$f(t)dt$$ for small $$dt$$. How?

2. What does passing to the limit mean?

3. The claim is made that $$h(t) = -frac{d}{dt}log{S(t)}$$, while possibly trivial I would appreciate to see this calculation.

4. What are the units of the hazard function (other than a vaguely defined likelihood)?

### 2. Time-independent random variables for hazard functions?

Since the hazard function is often used in a time-dependent manner, can one use it for a time-indenepent continuous random variable?

1. You are correct that the most of the usage of the word “joint” comes from joint distribution of multiple random variables. Obviously the author use “joint probability” to describe the probability of the intersection of events. I do see some usage on the web and other text; but whether it is a very frequent usage I am not sure.

2. By definition
\$\$ f_T(t) = frac {d} {dt} F_T(t)
= lim_{Delta t to 0} frac {F_T(t+Delta t) – F_T(t)} {Delta t}
= lim_{Delta t to 0} frac {Pr{t < T leq t + Delta t}} {Delta t}\$\$
Therefore you claim that \$Pr{t < T leq t + Delta t} approx f_T(t)Delta t\$ as \$Delta t\$ is small. Also note
\$\$Pr{t < T leq t + Delta t cap T > t} = Pr{t < T leq t + Delta t}\$\$
as \$t < T leq t + Delta t\$ is a subset of \$T > t\$

3. Passing to the limit means taking limit (after some calculations). You need to learn the definition of limit of sequence / limit of function if you are not sure about the concept.

4. It depends on your fundamental definition of \$h(t)\$:
\$\$ h(t) = frac {f(t)} {S(t)}
= frac {1} {S(t)} frac {d} {dt} F(t)
= frac {1} {S(t)} frac {d} {dt} [1 – S(t)]
= -frac {1} {S(t)} frac {d} {dt} S(t)
= – frac {d} {dt} ln S(t)\$\$

5. You see from definition it is unitless – survival function is just a probability, and pdf is the derivative of CDF.

Not sure about your last question. Hazard function is often used to in time modelling of survival analysis. Inherently there is nothing prohibiting hazard function to be used in other places.

## Math Genius: Expectation value of the reciprocal of a sum of geometrical

I was studying statistical inference when I had a problem with the following probability problem.

## Problem:

Suppose I have that $$X_i sim Geom(p)$$ where $$p in [0,1]$$.

Let us define $$Y= frac{1}{sum_{1 leq i leq n} Xi}$$.

I would like to find the value of $$mathbb{E}[Y]$$.

## My attempt:

I tried using some manipulation and using the Beta function but I cannot solve anything.
I tried computing explicitly the sum involved but it is not easy.
I used How to compute the sum of random variables of geometric distribution to compute the distribution of the sum of Geometrical a.v.