Math Genius: MDP tabular setting.

Original Source Link

I would be very curious to know if in the tabular MDP setting we have:
$$ E_{tau_1,cdots, tau_{N} sim P^{pi}_{mu}} (1_{A}) = E_{s_1,cdots,s_{N times H} sim d^{pi}_{mu}} E_{a_1 sim pi(.|s_1) cdots a_{N times H} sim pi(.|s_{N times H})}(1_A) $$

With the usual notations: $tau$ are paths of length $H$, $pi$ is a policy, $mu$ is the initial distribution, $d^{pi}_{mu}$ is the state probability of appearance (ie $P(s)$).
$A$ is an event involving all the data points. (like for example observing $(s,a)$ $k$ times). It seems intuitively true, but not sure how to show it.

Thank you!!

Tagged : / /

Leave a Reply

Your email address will not be published. Required fields are marked *