This iteration of UDIA is:

P(s,rs,a)P(St=s,Rt=rSt1=s,At1=a)P(s', r|s, a) \doteq P(S_t=s', R_t=r|S_{t-1}=s, A_{t-1}=a)

An application of states, actions, and rewards; namely:


qπ(s,a)Eπ[k=0γkRt+k+1St=s,At=a]q_\pi(s, a) \doteq \mathbb{E}_\pi\Big[\sum_{k=0}^\infin\gamma^kR_{t+k+1}|S_t=s, A_t=a\Big]

The value of an action in a state, given a policy

Vπ(s)Eπ[k=0γkRt+k+1St=s]V_\pi(s) \doteq \mathbb{E}_\pi\Big[\sum_{k=0}^\infin\gamma^kR_{t+k+1}|S_t=s\Big]

The value of state, given a policy

Gtk=0(γkRt+k+1)G_t \doteq \sum_{k=0}^\infin(\gamma^kR_{t+k+1})

A singular goal

We are all agents of the universal dream.