Math Genius: Conditional and joint distribution of the sum of exponential RVs

Original Source Link

Let $X_1,X_2,…,X_n$ be i.i.d. $Exp(lambda)$ random variables and $Y_k =sum^{k}_{i=1}X_i$, $k = 1,2,…,n$.

a) Find the joint PDF of $Y_1,…,Y_n$.

b) Find the conditional PDF of $Y_k$ conditioned on $Y_1,….,Y_{k−1}$, for $k = 2,3,…,n$.

c) Show that $Y_1,…,Y_k$ conditioned on $Y_{k+1},…,Y_n$ is uniformly distributed over a subset in $Bbb{R}^k$, for $k = 1,2,…,n−1$. Find this subset.

My attempt:

For $lambda_i = lambda$, $sum^{n}_{i=1}X_i sim Erlang(n,lambda) $, thus $Y_ksim Erlang(k,lambda)$

From here I need to find the CDF first to find the PDF. But I don’t understand how.

Begin here:

Since for all $2 leq kleq n$ we have $X_k=Y_k-Y_{k-1}$, and the $(X_k)$ are iid expnentially distributed with pdf $f_{small X}(x)=lambdaexp(-lambda x)cdotpmathbf 1_{0leq x}$ … therefore… $$begin{align}f_{small Y_1,Y_2,ldots,Y_n}(y_1,y_2,ldots, y_n)&=f_{small X_1,X_2,ldots,X_n}(y_1,y_2{-}y_1,ldots,y_n{-}y_{n-1})\[1ex]&= f_{small X}(y_1)prod_{k=2}^nf_{small X}(y_k-y_{k-1})\[2ex]&=lambda^nexpleft(-lambdaleft(y_1+sum_{k=2}^n(y_k-y_{k-1})right)right)cdotmathbf 1_{0leq y_1leq y_2leqldotsleq y_n}\[2ex]&=phantom{lambda^nexp(-lambda y_n)cdotmathbf 1_{0leq y_1leq y_2leqldotsleq y_n}}end{align}$$

Tagged : / / / /

Math Genius: Question about probability in a gacha game

Original Source Link

There is a gacha game I play that has a banner with 11 rainbow units. The pool has 116 units, all with equal chance to be pulled(dupes are allowed).
Let’s say I am interested in 11 units out of those 116, and if I pull on the banner I get 11 units.

If I want to calculate the chance to get at least one I’m interested in, in the 11 pulls, it would be 105/116(11 units are okay to get)^11(the number of units/chances I’ll get on the banner)=0,334=33%. Is this correct?

But my main question is: what if I want to know the probability of getting at least 3 units out of the 11 units I’m interested in, in 11 pulls? I have no idea how to calculate this.

TLDR: 11 units out of 116 are okay. Dupes are allowed(in all 11 pulls the total number of units in the pool will remain 116).Out of the 11 units I want, what’s the chance to get at least 3 of those in 11 pulls?

I would appreciate any help.
Sorry for the long question and thank you in advance!

Tagged :

Code Bug Fix: How to calculate 95% confidence interval for a proportion in R?

Original Source Link

Assume I own a factory that produces 150 screws a day and there is a 22% error rate. Now I am going to estimate how many screws are faulty each day for a year (365 days) with

    rbinom(n = 365, size = 150, prob = 0.22)

which generates 365 values in this way

45 31 35 31 34 37 33 41 37 37 26 32 37 38 39 35 44 36 25 27 32 25 30 33 25 37 36 31 32 32 43 42 32 33 33 38 26 24 ...................

Now for each of the value generated, I am supposed to calculate a 95% confidence interval for the proportion of faulty screws in each day.

I am not sure how I can do this. Is there any built in functions for this (I am not supposed to use any packages) or should I create a new function?

If the number of trials per day is large enough and the probability of failure not too extreme, then you can use the normal approximation

# number of failures, for each of the 365 days
f <- rbinom(365, size = 150, prob = 0.22)

# failure rates
p <- f/150

# confidence interval for the failur rate, for each day
p + 1.96*sqrt((p*(1-p)/150))
p - 1.96*sqrt((p*(1-p)/150))

Tagged : / / /

Math Genius: Finding the Joint Distribution of Two Normally Distributed Random Variables

Original Source Link

Question: Suppose $X_1$, $X_2$ and $X_3$ are independent random variables such that $X_1 sim N(0,1)$, $X_2 sim N(1,4)$ and $X_3 sim N(-1,2)$. Let $Y_1=X_1+X_3$ and $Y_2=X_1+X_2-2X_3$. Give the joint distribution of $Y_1$ and $Y_2$.

My attempt: So far I have calculated that $Y_1 sim N(-1, 3)$ and $Y_2 sim N(3, 13)$ by use of the fact the independence of the $X_i$ implies that any linear combination of them is also a normally distributed random variable. However, I am unsure how to find their joint distribution.

Theory: Considering the transformation $Z=mu+BX$ where:

  • B is some $m times n$ matrix
  • $X$ is a vector of iid standard normal variables
  • $Z$ has expectation vector $mu$ and covariance matrix $Sigma = BB^T$

Then we have $Z sim N(mu, Sigma)$.

Application of theory: I presume in my case I would let $Y$ denote the column vector $[Y_1, Y_2]^T$. Then, from my above working, the expectation vector would be $mu = [-1, 3]^T$. However, I am not sure how to go from here to determine what $X$ would be or how to calculate the covariance matrix $Sigma$.

Tagged : / / / /

Math Genius: Coin toss problem

Original Source Link

A fair coin is repeatedly tossed until a head is obtained. Calculate the probability that this experiment will not end after the first $6194$ attempts, given that it has not ended after the first $6192$ attempts.

I know we can solve this using geometric distribution, but I’m having some trouble applying it correctly. Do I need to find $P(X > r)$, for $r = 6194$?

We know that the experiment has not ended after $6192$ tosses, so the probability we’re looking for must be multiplied by $(dfrac{1}{2})^{6192}.$ Where do we go from here? I’m not sure what to do on the $6193^{rd}$ toss. Can someone please explain how to solve this? Thank you for your time.

When given that the experiment does not end until after 6192 attempts, then the only way it will end after 6194 attempts is that it does not end on either the 6193rd or 6194th attempt.

Since these two trials are independent of all the prior trials, their successes do not influence the results of these two trials.

So the (conditional) probability we seek is merely the probability these two trials fail.

$$mathsf P(X>6194mid X>6192)=left({tfrac 1 2}right)^2$$

Alternatively, since you have already calculated $mathsf P(X>6912)$, then you can do the same for $mathsf P(X>6194)$.

$$begin{align}mathsf P(X>6194mid X>6192)&=dfrac{mathsf P(X>6194cap X>6192)}{mathsf P(X>6192)}\[2ex]&=dfrac{mathsf P(X>6194)}{mathsf P(X>6192)}&&{tiny{X>a+2}cap{X>a}iff {X>a+2}}\[2ex]&=dfrac{mathsf 1/2^{6194}}{1/2^{6192}}end{align}$$

Tagged : /

Math Genius: Out of 8 books, you want 3. You pick 4 at random. What are the chances you got at least two you wanted?

Original Source Link


In a bag there are 8 books you want. 3 of them are desirable. You pick 4 at random. What are the chances you get at least 2 of the 3 desired ones?

My thoughts

Presumably $P(text{at least 2}) = P(2) + P(3)$.

The total number of combinations of books you can end up with, is $binom84 = 70$.

Let’s label the books with letters A-H, and let the three desired ones be A, B, C.

The possible combinations where you get all of them, would be ABCD, ABCE, ABCF, ABCG, ABCH. So a total of 5 combinations.

This gets us $P(3) = frac5{70} = frac1{14}$.

But I’m struggling with $P(2)$.

Ah of course. We want 2 from the 3 desirables, and 2 from the 5 non-desirables.


$P(3)$ is the probability for obtaining 3 from 3 favoured books and 1 from 5 unfavoured, when selecting any 4 from all 8 books. This is what you were manually counting.

$$P(3)=dfrac{dbinom 33dbinom 51}{dbinom 84}=dfrac{5}{70}$$

Likewise, $P(2)$ is the probability for obtaining 2 from 3 favoured books and 2 from 5 unfavoured, when selecting any 4 from all 8 books.

$$P(2)=dfrac{dbinom 32dbinom 52}{dbinom 84}=dfrac{30}{70}$$


Hint: the number of desirable books you get is given by a hypergeometric random variable.

Tagged : /

Math Genius: The Hazard Function: derivation and assumptions of random variable

Original Source Link


1. Explaining the derivation of the hazard function

A bit of context

The hazard function is often found stated in brevity as:


where $f(cdot)$ is the probability density function, and $S(cdot)$ is the survival function. Throughout this question I will be referring the descriptions given by Rodríguez and Tian.

Traditionally the survival and hazard functions come into play when the random variable $T$ is non-negative and continuous. In this sense, at least the concept of the survival function is remarkably straight forward being the probability that $T$ is greater than $t$.

$$S(t)= 1-F(t) = P(T>t)$$

This is especially intuitive when put in context, e.g. the time following diagnosis of a disease until death.

From the definition of the hazard function above, it is clear that it is not a probability distribution as it allows for values greater than one.

My confusion comes in at Rodríguez‘s definition:

$$ h(t) = limlimits_{dtrightarrow0}frac{P(tleq T<t+dt|Tgeq t)}{dt}$$

This to me, really only reads in a manner that makes sense in context, e.g. the numerator being the probability that the diagnosed person dies in some increment of time ($dt$) following some passage of time $t$, given that they have lived at least so long as the passage of time $t$ (or simpler, if it has been $t$ time since diagnosis, the probability that you’ll die within the next $dt$ time).

Confusion starts here:

Prior to the definition of equation (7.3) he states:

“The conditional probability in the numerator may be written as the ratio of the joint probability that $T$ is in the interval $[t,t+dt)$ and $Tgeq t$ (which is, of course, the same as the probability that $t$ is in the interval), to the probability of the condition $Tgeq t$. The former may be written as $f(t)dt$ for small $dt$, while the latter is $S(t)$ by definition”

My confusion comes from the following:

  1. in my exposure, joint distortions come from two random variables, not one as is the case here, $T$.

If I just accept that can be the case, I can then use a rule from conditional probability $$P(Acap B)=P(A|B)P(B)$$ to restructure the numerator:

$$P(t leq T < t+dt | T geq t) = frac{P(t leq T < t+dt cap Tgeq t)}{P(Tgeq t)}$$

then substitute back in to get:
$$h(t) = limlimits_{dtrightarrow0} = frac{P(t leq T < t+dt cap Tgeq t)}{P(Tgeq t)dt}$$

  1. it is stated matter of fact that P(t leq T < t+dt cap Tgeq t) may be written as $f(t)dt$ for small $dt$. How?

  2. What does passing to the limit mean?

  3. The claim is made that $h(t) = -frac{d}{dt}log{S(t)}$, while possibly trivial I would appreciate to see this calculation.

  4. What are the units of the hazard function (other than a vaguely defined likelihood)?

2. Time-independent random variables for hazard functions?

Since the hazard function is often used in a time-dependent manner, can one use it for a time-indenepent continuous random variable?

  1. You are correct that the most of the usage of the word “joint” comes from joint distribution of multiple random variables. Obviously the author use “joint probability” to describe the probability of the intersection of events. I do see some usage on the web and other text; but whether it is a very frequent usage I am not sure.

  2. By definition
    $$ f_T(t) = frac {d} {dt} F_T(t)
    = lim_{Delta t to 0} frac {F_T(t+Delta t) – F_T(t)} {Delta t}
    = lim_{Delta t to 0} frac {Pr{t < T leq t + Delta t}} {Delta t}$$
    Therefore you claim that $Pr{t < T leq t + Delta t} approx f_T(t)Delta t$ as $Delta t$ is small. Also note
    $$Pr{t < T leq t + Delta t cap T > t} = Pr{t < T leq t + Delta t}$$
    as $t < T leq t + Delta t$ is a subset of $T > t$

  3. Passing to the limit means taking limit (after some calculations). You need to learn the definition of limit of sequence / limit of function if you are not sure about the concept.

  4. It depends on your fundamental definition of $h(t)$:
    $$ h(t) = frac {f(t)} {S(t)}
    = frac {1} {S(t)} frac {d} {dt} F(t)
    = frac {1} {S(t)} frac {d} {dt} [1 – S(t)]
    = -frac {1} {S(t)} frac {d} {dt} S(t)
    = – frac {d} {dt} ln S(t)$$

  5. You see from definition it is unitless – survival function is just a probability, and pdf is the derivative of CDF.

Not sure about your last question. Hazard function is often used to in time modelling of survival analysis. Inherently there is nothing prohibiting hazard function to be used in other places.

Tagged : / /

Math Genius: Please help me with the Expected-Value notation

Original Source Link

I am looking at the machine learning paper, and I came across expected-value in the following notation:

$E_{s_t sim rho^B, alpha sim beta, r_t sim E}[(Q(s_t, a_t |theta^Q)-y_t)^2]$

(eqn 4) on

My Question is, Does it mean that the expectation is taken over $s_t, alpha, $ and $ r_t$?
what is that $_{s_t sim rho^B, alpha sim beta, r_t sim E}$ means? Does it mean that the expectation is taken over $s_t, alpha, $ and $ r_t$?


Yes; it means that you take the expectation over those variables. The ~ sign is to show you what “space” they live in; so the variable $s_t$ comes from the space $rho^beta$. So, when you take an expectation over those, they will disappear, and the remaining quantity will be a function of theta. (note that $y_t$ is also a function of $theta$ as they mention it in their paper; the other parameters it depends on also disappear after taking expectation)

Tagged : /

Math Genius: Why is that $mathbb{E}[X|A]mathbb{P}(A)=int x mathbb{I}_{A} dP$?

Original Source Link

I’d like to prove that:

mathbb{E}[X|A]mathbb{P}(A)=int x mathbb{I}_{A} dP

This came to me because the “Law of total expectation” would be way easier to prove if I could argue that:

mathbb{E}[X] = sum_{i=1}^infty int x mathbb{I}_{(A_i)} dP=sum_{i=1}^inftymathbb{E}[X|A_i]mathbb{P}(A_i)

with $A_i$ a partition of $Omega$.

Assuming that $A$ is a set, $Z=mathbb{E}[X|A]$ is a constant and thus:


where we used that $mathbb{I}_A in sigma(A)$ (therefore $ mathbb{E}[Xmathbb{I}_A|A]=mathbb{I}_Amathbb{E}[X|A]$) and the last equality is the tower law.

Tagged : / / /