Math Notation Cheatsheet

In this section I outline the meaning of the mathematical notation I use. When appropriate (and possible), I also describe the meaning in simple python.

General math and statistics

$f(x) \triangleq mx + b$

The $\triangleq$ indicates that the expression on the left is defined to be the expression on the right, rather than an equivalence that is derived from mathematical rules.

$(a, b)$

The set of real numbers between $a$ and $b$, excluding those values.

$[a, b]$

The set of real numbers between $a$ and $b$, including those values.

$(a, b]$ and $[a, b)$

The set of real numbers between $a$ and $b$, excluding the left or right bound, respectively.

$\{ a, b, c \}$

A set containing elements a, b, and c.

$x \in X$

Indicates that value $x$ is an element in set $X$.

$x_i$

The $ith$ element of the indexed set (list) of elements $x$. In python x[i].

$\sum_i^N x_i$

The sum of the first $N$ elements of an indexed set (list) of elements $x$. Could also be written as $x_1 + x_2 + … + x_N$.

sum = 0
for i in range(N)
    sum += x[i]

Whether the starting index is 0 or 1 depends on the context of the variable. If $i$ is assigned to a value like $\sum_{i=3}^N$, it means the series starts at element $i$. If it is unclear whether the starting index is 0 or 1, it will sometimes be explicitly assigned.

$\sum_{x \in X} x$

The sum of the elements in $X$.

sum = 0
for x in X:
    sum += x

$\prod_i^N x_i$

The product of the first $N$ elements of an indexed set (list) of elements $x$. Could also be written as $x_1 \cdot … \cdot x_N$

prod = 1
for i in range(N):
    prod *= x[i]

Whether the starting index is 0 or 1 depends on the context of the variable. If $i$ is assigned to a value like $\prod_{i=3}^N$, it means the series starts at element $i$. If it is unclear whether the starting index is 0 or 1, it will sometimes be explicitly assigned.

$x_{t+1} \gets g(x_t)$

The $\gets$ arrow indicates that the value of $x$ is updated to be the result of some function/operation on the previous value of $x$ defined on the right-hand-side (it doesn’t have the be $g$ and can be any expression). The subscript $t$ indicates the value of $x$ after the $t$th update to it.

$E_p[X]$

The expected value of random variable $X$ drawn from probability distribution $p$. For discrete random variables this is defined as $E_p[X] \triangleq \sum_{x \in X} p(x) x$. For continuous random variables this is the integral $E_p[X] \triangleq \int_X p(x) x dx$.

expected_x = 0.0
for x in X:
    expected_x += p(x) * x

$x \sim p$

Random variable $x$ is drawn from probability distribution $p$.

$E_{x \sim p} [ g(x) ]$

The expected value of drawing $x$ from probability distribution $p$ and then applying some operation on it – in this case evaluating the function $g$ on it. For discrete random variables $E_{x \sim p} [ g(x) ] \triangleq \sum_{x \in X} p(x) g(x)$.

$p(x | y)$

The probability (or probability density) of $x$ given the value $y$ from conditional probability (mass/density) function $p$

$p( \cdot | y)$

The conditional probability distribution (rather than specific probability/density) conditioned on $y$ implied by the probability (mass/density) function $p$.

RL-specific variables and notation choices

$S$

State space

$A$

Action space

R(s, a) or R(s, a, s')

A reward function on state-action pairs or state-action-next state triples. In practice, reward functions are usually defined in terms of the state $s’ \in S$, while theoretical analysis usually assumes the simpler $R(s, a)$ functions. Doing analysis with $R(s, a)$ does not limit the validity of the analysis because for any MDP with an $R(s, a, s’)$ function, you can define an equivalent MDP with an $R(s, a)$ function that is the expected value of the $R(s, a, s’)$ values (averaging over the next state probabilities).

T(s’ | s, a)

The probability (density) that the environment transitions to state $s’ \in S$ after the agent takes action $a \in A$ from state $s \in S$.

$\gamma$

A geometric discount factor $\gamma \in [0, 1)$.

$\pi(s)$

A deterministic policy that maps state $s \in S$ to an action.

$\pi(a | s)$

The probability (mass/density) of stochastic policy $\pi$ selecting action $a \in A$ from state $s \in S$.

$\pi(\cdot | s)$

A stochastic policy distribution conditioned on state $s$.