Let $X_1,…,X_n$ be random variables.

Then the formula

$$mathbb{P}(pmatrix{X_1\…\X_n} = A)$$

can have two meanings, dependent on how $mathbb P $ is defined:

It can mean either $mathbb{P}left(left{winOmegamid pmatrix{X_1(w)\…\X_n(w)} = zright}right)$ , if we define our probability space as $(Omega,mathcal{F},mathbb{P})$

or $mathbb{P}left(left{pmatrix{w_1\…\w_n} in times_{i=1}^n Omega_i mid pmatrix{X_1\…\X_n}pmatrix{w_1\…\w_n} = zright}right)$

, if we define our probability space as $(Omega^n,mathcal{F},mathbb{P})$.

However, defining the probability space is often skipped, and the domain of the $X_i$ isn’t always denoted either.

If I happen to stumble upon such a case where I can’t somehow deduce which is correct, is there a difference between the two cases, and which should I assume it is, then?

So, in general it is: When you deal with a function $f:mathbb R to mathbb R^n$, then you can write it as a $f=(f_1,…,f_n)$ where $f_k:mathbb R to mathbb R$ is a single variable real valued function for $k in {1,..,n}$. In that case $f(t) = (f_1(t),…,f_n(t))$ and you probably won’t have any doubt, it is the way that we should look at it. The same goes if our function have a domain in more abstract space, that is, Let $(Omega,mathcal F,mathbb P)$ be a probability space, and let $X:Omega to mathbb R^n$ be random variable. It can be shown, that writing $X=(X_1,…,X_n)$, then $X_k : Omega to mathbb R$ is a random variable for any $k in {1,…,n}$. In that case $X(omega) = (X_1(omega),…,X_n(omega))$ and the proper way to look at it is $mathbb P(X in A) = mathbb P ({omega in Omega : (X_1(omega),…,X_n(omega)) in A })$, since $mathbb P$ is a measure on $(Omega,mathcal F)$. So when you have just a random vector, then you shouldn’t think that every coordinate takes different argument. Every coordinate should be a function from whole space (that is $Omega$) no matter what $Omega$ looks like.

Okay, it was the case, when we had a $mathbb R^n$ valued random variable and made it into the vector of $mathbb R$ valued random variables. However, there are other possibilities. Instead of having vector and looking at it coordinates, we can have a lot of random variables and form a new vector. However, it isn’t as simple as it might look: If you have probability spaces $(Omega_1,mathcal F_1,mathbb P_1),…,(Omega_n,mathcal F_n, mathbb P_n)$ and defined real valued random variables on them: $X_k : Omega_k to mathbb R^n$, you can define new set, call it $Omega$ which would be defined as $Omega = Omega_1 times … times Omega_n$ (so every $omega in Omega$ is of the form $(omega_1,…,omega_n)$ where $omega_k in Omega_k$), and (call it for now function, since we didn’t specified sigma fields) function $X:Omega to mathbb R^n$ given by $X(omega) = (X_1(omega_1),…,X_n(omega_n))$. And that is obviously another proper way to look at it, when considering $X$ just as a function, but it is more subtle when considering it as a random, measurable function! As for the case with measurability, you can always define new sigma field as $mathcal F = mathcal F_1 otimes … otimes mathcal F_n =: sigma( A_1 times … times A_n : A_k in mathcal F_k , k in {1,..,n})$ (loosely speeking, you just take any $”$rectangle$”$ of the base elements of every $mathcal F_k$ which is in form $A_1 times … times A_n$ where $A_k in mathcal F_k$ and close it under operationts that are needed to form a $sigma-$field.) Now, the problem with measuring (so calculating probability) isn’t as easy, it requires the concept of Product measure (which you can google). Again, loosely speaking, it defines a measure $mathbb P$ on our new measurable space $(Omega,mathcal F)$ to be $mathbb P(A_1 times … times A_n) = mathbb P_1(A_1) cdot … cdot mathbb P_n(A_n)$ for every rectangle $A_1 times … times A_n$ (in case of your random variable it would mean that $mathbb P(X in B_1 times … times B_n) = mathbb P({omega in Omega : X(omega) in B_1 times … times B_n}) = mathbb P_1({omega_1 in Omega_1 : X_1(omega_1) in B_1 }) cdot … cdot mathbb P_n({omega_n in Omega_n : X_n(omega_n) in B_n }) = mathbb P_1(X_1 in B_1)…mathbb P_n(X_n in B_n)$ (note that we’re calculating probability for every $X_k$ on the other space and with respect to different probability measure, since variables $X_1,…,X_n$ arn’t defined on $Omega$ but on $Omega_1,…,Omega_n$ respectivelly). It (again google product measure) can be shown that when the space is $sigma-$finite then it is uniquelly determined.

It was a long story, but what is important, that when defining it as a product space, then every coordinate would be INDEPENDENT! (note the measure $mathbb P$ is defined so, since $mathbb P_k(X_k in B_k) = mathbb P( X_k in B_k, X_1,…,X_{k-1},X_{k+1},…,X_n in mathbb R )$) That means by considering a random vector in the way that coordinates have different arguments, it would make them independent, and we’re loosing a lot of possible random vector, where the coordinates need not be independent.

So the lesson should be, in general, when dealing with random vector $X$ on $(Omega,mathcal F,mathbb P)$ you can write it as $X=(X_1,…,X_n)$ where every $X_k : Omega to mathbb R$ is a random variable and $X(omega) = (X_1(omega),…,X_n(omega))$ since it is just defined this way. However, when you know that $X_1,…,X_n$ are independent, then you can REDEFINE (it won’t be exactly the same random variable, just similar, with the same distribution) it as in the product space, taking every $Omega_k = Omega$ and $X:Omega^n to mathbb R^n$ will be given as $X((omega_1,..,omega_n)) = (X_1(omega_1),…,X_n(omega_n))$. That sometimes is an useful approach when one is interested in things concerning only the distribution of $X$.

Q.1. Is there a difference between the two cases?

A.1 Yes, I discern a difference between the two cases.

Q.2 If I can’t deduce which set of assumptions to use, which set of assumptions should I use?

A.2 Based on the law of parsimony, unless indicated otherwise, you should first assume the case that requires less words. Notice that $Omega$ is more parsimonious than $Omega^n$. Therefore, first go with $Omega$.

Q.3 Can you give me a more precise answer?

A.3 I think the way to treat this would be to consider something like the partition function. You can use this process.

By $m$, I denote a least least upper bound of $n$. For $n leq m$, the entropy will be

$$sigma^{(n)}_{X} = sum_{iin n} p_i log(p_i)$$

So you obtain a sequence

$$left{sigma^{(1)}_{X}, sigma^{(2)}_{X}, ldots, sigma^{(m)}_{X}right}$$

Lets get a sense of things, by looking at the entropys’ expected values under equi-partition. The entropys’ expected values under equipartition are

$$left{left< sigma^{(1)}_{X}right>, left<sigma^{(2)}_{X}right>, ldots, left<sigma^{(m)}_{X}right>right}= left{-log(1), -log(2), ldots, -log(m)right}.$$

Observe that the sequence diverges.

One ask, which one of these gives the best and most correct indicator? Notice for $q<p$, that expected value of $left<sigma^{(q)}right> > left<sigma^{(p)}right>$. Also ponder the following: in getting $left<sigma^{(p)}right>$, I had to average **all** possibilities—including for all possibilities when there were only $q$ non-zero random variables.

This insinuates that you should look at your expected results for all $nin mathbb{Z}^+$, and also in the limit that $n$ goes to infinity. My guess is that though the partition function diverges with increasing $n$, the thing that you would actually care to and be able to observe would converge.