Final Assignment of Selected Topics in Probability

The Existence and Uniqueness of Regular Conditional Probability

Author

Liyve

Published

June 23, 2025

Modified

July 2, 2025

Abstract
This assignment explores the concept of regular conditional probability, focusing on its existence and uniqueness. However, the proofs for existence and uniqueness were not provided in [1]. As a result, rigorously establishing these proofs became the central focus of the assignment.

1 Foreword

During last semester’s study of Probability Theory (bilingual course), students at the Qiushi College were learning Probability Theory using the textbook [2] . This textbook does not introduce probability theory through the language of measure theory, nor does it cover the concept of conditional expectation. However, during a proof in Mathematical Statistics in the second week of this semester, conditional expectation appeared in the textbook [3] as an integral of conditional probability. This led me to hypothesize that the concept of conditional expectation mentioned in the textbook [3] must be the same as what we studied last semester [1]. After discussions with classmates, we failed to reach a satisfactory conclusion. It was not until the fourth week, during the Selected Topics in Probability, that the concept of regular conditional probability was briefly introduced. Unfortunately, [1] claimed that this topic had little relevance to the subsequent course content and thus avoided further discussion. Consequently, I decided to explore this topic as the focus of my final assignment.

2 Regular Conditional Probability

In our in-class discussion, we observed that conditional probability \[ P[A\mid \mathcal{A}_0]:=\mathbb{E}[\mathbb{I}_A\mid\mathcal{A}_0] \] satisfies \[ 0\leq P[A\mid\mathcal{A}_0]\leq 1 \]

\[ P[\emptyset\mid\mathcal{A}_0]=0;\ P[\Omega\mid\mathcal{A}_0]=1 \]

for \(A_n,\ n\in\mathbb{N}\), pairwise disjoint \[ P[\bigcup_{n=1}^{\infty}A_n\mid \mathcal{A}_0]=\sum_{n=1}^{\infty}P[A_n\mid \mathcal{A}_0] \] almost surely.

However, the null exceptional set depends on all of measurable set \(A\), and there exists no universal null set that makes the conditional probability be a probability measure. This motivates us to explore a more refined approach to characterize \(P[A\mid \mathcal{A}_0]\) in a stronger sense.

Definition 1 (Probability Kernel) Let \((\Omega,\mathcal{A})\) and \((\Omega',\mathcal{A}')\) be a pair of measurable spaces. A function \(K:\Omega\times\mathcal{A}'\to[0,1]\) satisfy:

  • \(\omega\to K(\omega,A')\) is an \(\mathcal{A}\)-measurable map for all \(A'\in \mathcal{A}'\);

  • \(A'\to K(\omega,A')\) is a probability measure on \((\Omega',\mathcal{A}')\) for all \(\omega\in\Omega\).

Then we call it Probability Kernel.

To obtain the desired “conditional probability”, we need to work within a better-behaved space, where such constructions can be rigorously defined.

Definition 2 (Standard Borel Space) A measurable space \((\Omega,\mathcal{A})\) is called standard Borel space, if there exists a complete metric \(d\) on \(\Omega\), such that \((\Omega,d)\) is separable and the Borel \(\sigma\)-algebra \(\mathcal{A}\) is generated by the topology on \((\Omega,d)\).

Proposition 1 (Regular Conditional Probability) cf.[1, Proposition 5.4.3]

Let \((\Omega,\mathcal{A})\) be a standard Borel space, \(P\) a probability measure on \((\Omega,\mathcal{A})\). Then for each \(\sigma\)-algebra \(\mathcal{A}_0\subseteq\mathcal{A}\), there exists a probability kernel \(K_{\mathcal{A}_0}\) from \((\Omega,\mathcal{A}_0)\) to \((\Omega,\mathcal{A})\), such that for all \(A\in\mathcal{A}\), it holds that:

\[ K_{\mathcal{A}_0}(\omega,A)=P[A\mid\mathcal{A}_0](\omega)\text{ for all }P\text{-a.s. }\omega\in\Omega. \] where the exceptional set might depend on \(A\).

If \(\tilde{K}_{\mathcal{A}_0}\) is another probability kernel from \((\Omega,\mathcal{A}_0)\) to \((\Omega,\mathcal{A})\) with the same property, then there exists a \(P\)-zero set \(N\in\mathcal{A}\), such that for every \(\omega\in\Omega\setminus N\) and every \(A\in\mathcal{A}\), it holds that:

\[ K_{\mathcal{A}_0}(\omega,A)=\tilde{K}_{\mathcal{A}_0}(\omega,A)\text{ for all }A\in\mathcal{A}. \]

The probability \(K_{\mathcal{A}_0}(\omega,A)\) is called regular conditional probability given by \(\mathcal{A}_0\).

3 The Existence of Regular Conditional Probability

Before proceeding with the proof, we first state and prove some lemmas and theorems that will be essential for the following arguments.

Lemma 1 (Doob-Dynkin lemma) Let \((\Omega, \mathcal{A}, P)\) be a probability space, and let \(X: \Omega \to S\) be a measurable function into a measurable space \((S, \mathcal{S})\). If \(f: \Omega \to \mathbb{R}\) is \(\sigma(X)\)-measurable, then there exists a measurable function \(g: S \to \mathbb{R}\) such that \(f = g \circ X\) almost surely.

Proof (Proof of Lemma 1). Let \(f: \Omega \to \mathbb{R}\) be \(\sigma(X)\)-measurable, where \(X: \Omega \to S\) is measurable.

Step 1: Structure of \(\sigma(X)\)

The \(\sigma\)-algebra \(\sigma(X)\) generated by \(X\) is defined as: \[ \sigma(X) = \{ X^{-1}(B) : B \in \mathcal{S} \}. \] Since \(f\) is \(\sigma(X)\)-measurable, for every Borel set \(B \subseteq \mathbb{R}\), we have: \[ f^{-1}(B) \in \sigma(X). \] Thus, there exists a set \(A_B \in \mathcal{S}\) such that: \[ f^{-1}(B) = X^{-1}(A_B). \] This defines a mapping \(B \mapsto A_B\) from Borel sets in \(\mathbb{R}\) to \(\mathcal{S}\). To ensure consistency, this mapping must preserve set operations (e.g., unions, intersections, complements), which follows from the fact that \(f^{-1}\) and \(X^{-1}\) are both \(\sigma\)-homomorphisms.

Step 2: Constructing \(g\) via rational intervals

We construct \(g: S \to \mathbb{R}\) as follows. For each rational \(r \in \mathbb{Q}\), define: \[ A_r = A_{(-\infty, r]} \in \mathcal{S}, \] where \(A_{(-\infty, r]}\) corresponds to the set in \(\mathcal{S}\) such that \(f^{-1}((-\infty, r]) = X^{-1}(A_r)\).

For \(s \in S\), define: \[ g(s) = \inf \{ r \in \mathbb{Q} : s \in A_r \}. \] This infimum is well-defined because:

  • For any \(s \in S\), since \(f(\omega) \in \mathbb{R}\) for all \(\omega\), there exists some \(r \in \mathbb{Q}\) such that \(f(\omega) \leq r\), hence \(X(\omega) \in A_r\).

  • The set \(\{ r \in \mathbb{Q} : s \in A_r \}\) is bounded below (by, say, \(-\infty\)).

Step 3: Verifying \(f = g \circ X\) almost surely

Let \(\omega \in \Omega\). We claim that \(f(\omega) = g(X(\omega))\) except on a \(P\)-null set.

  • For \(f(\omega) \leq r\) with \(r \in \mathbb{Q}\):

    • If \(f(\omega) \leq r\), then \(\omega \in f^{-1}((-\infty, r]) = X^{-1}(A_r)\), so \(X(\omega) \in A_r\), implying \(g(X(\omega)) \leq r\).

    • Conversely, if \(X(\omega) \in A_r\), then \(g(X(\omega)) \leq r\), so \(f(\omega) \leq r\).

  • For \(f(\omega) \geq r\) with \(r \in \mathbb{Q}\):

    • If \(f(\omega) > r\), then \(\omega \notin f^{-1}((-\infty, r]) = X^{-1}(A_r)\), so \(X(\omega) \notin A_r\), implying \(g(X(\omega)) > r\).

Thus, \(g(X(\omega)) \leq f(\omega)\) and \(g(X(\omega)) \geq f(\omega)\) hold for all \(\omega\) outside a null set where \(f(\omega)\) may not equal \(g(X(\omega))\). However, since \(f\) and \(g \circ X\) are measurable, the set \(\{ \omega : f(\omega) \neq g(X(\omega)) \}\) is measurable and has measure zero.

Step 4: Measurability of \(g\)

To show \(g\) is \(\mathcal{S}\)-measurable, we verify that for any \(a \in \mathbb{R}\), the set \(\{ s \in S : g(s) \leq a \}\) belongs to \(\mathcal{S}\).

  • For rational \(a\): \[ \{ s \in S : g(s) \leq a \} = \bigcap_{\substack{r \in \mathbb{Q} \\ r < a}} A_r^c \cup A_a. \] This follows from the definition of \(g\) as an infimum over rationals.

  • For general \(a \in \mathbb{R}\): Approximate \(a\) by a decreasing sequence of rationals \(\{r_n\}\). Then: \[ \{ s \in S : g(s) \leq a \} = \bigcap_{n=1}^\infty A_{r_n}. \] Since each \(A_{r_n} \in \mathcal{S}\), the countable intersection is also in \(\mathcal{S}\).

Hence, \(g\) is \(\mathcal{S}\)-measurable.

Step 5: Uniqueness up to null sets

If \(g'\) is another measurable function satisfying \(f = g' \circ X\) almost surely, then \(g(X(\omega)) = g'(X(\omega))\) for \(P\)-almost every \(\omega\). Since \(X\) is measurable, the pushforward measure \(P_X = P \circ X^{-1}\) ensures that \(g = g'\) almost surely with respect to \(P_X\).

We have constructed a measurable function \(g: S \to \mathbb{R}\) such that \(f = g \circ X\) almost surely, completing the proof.

Remark

Remark. The Doob-Dynkin lemma establishes that any \(\sigma(X)\)-measurable function \(f\) can be expressed as \(f = g \circ X\) for some measurable \(g\). In the context of regular conditional probabilities, it ensures that the conditional expectation \(\mathbb{E}[1_A \mid \mathcal{A}_0]\), being \(\mathcal{A}_0\)-measurable, can be represented as a function of a generating random variable \(\eta\) (e.g., \(\eta(\omega) = \omega\)). This allows the construction of the kernel \(K_{\mathcal{A}_0}(\omega, A)\) as a measurable function of \(\omega\), satisfying the required properties.

Theorem 1 (Carathéodory extension theorem) Let \(\mathcal{C}\) be an algebra of subsets of a set \(\Omega\), and let \(\mu_0: \mathcal{C} \to [0, \infty]\) be a countably additive pre-measure. Then there exists a measure \(\mu\) on the \(\sigma\)-algebra \(\mathcal{A} = \sigma(\mathcal{C})\) such that \(\mu|_{\mathcal{C}} = \mu_0\). Moreover, this extension is unique if \(\mu_0\) is \(\sigma\)-finite.

Proof (Proof of Theorem 1). Let \(\mathcal{C}\) be an algebra of subsets of \(\Omega\), and \(\mu_0: \mathcal{C} \to [0, \infty]\) a countably additive pre-measure.

Step 1: Outer measure construction

Define the outer measure \(\mu^*\) on all subsets \(E \subseteq \Omega\) by \[ \mu^*(E) = \inf\left\{ \sum_{n=1}^\infty \mu_0(C_n) : E \subseteq \bigcup_{n=1}^\infty C_n,\, C_n \in \mathcal{C} \right\}. \] We verify that \(\mu^*\) is an outer measure:

  • Monotonicity: If \(A \subseteq B\), then any cover of \(B\) is also a cover of \(A\), so \(\mu^*(A) \leq \mu^*(B)\).

  • Countable subadditivity: For any sequence \(\{A_n\}\), we construct covers \(\{C_{n,k}\}_{k=1}^\infty\) of \(A_n\) with \(\sum_{k=1}^\infty \mu_0(C_{n,k}) \leq \mu^*(A_n) + \epsilon/2^n\). The union \(\bigcup_{n,k} C_{n,k}\) covers \(\bigcup_n A_n\), and \(\sum_{n,k} \mu_0(C_{n,k}) \leq \sum_n \mu^*(A_n) + \epsilon\). Letting \(\epsilon \to 0\) gives \(\mu^*(\bigcup_n A_n) \leq \sum_n \mu^*(A_n)\).

  • Empty set: \(\mu^*(\emptyset) = 0\) since \(\emptyset \subseteq \emptyset\) and \(\mu_0(\emptyset) = 0\).

Step 2: Carathéodory measurability

A set \(A \subseteq \Omega\) is called \(\mu^*\)-measurable if for all \(E \subseteq \Omega\), \[ \mu^*(E) = \mu^*(E \cap A) + \mu^*(E \setminus A). \] Let \(\mathcal{A}\) be the collection of all \(\mu^*\)-measurable sets. We show \(\mathcal{A}\) is a \(\sigma\)-algebra:

  • Closed under complements: If \(A \in \mathcal{A}\), then \(A^c\) satisfies the same condition by symmetry.

  • Closed under countable unions: First, prove closure under finite unions by induction. For countable unions \(\bigcup_{n=1}^\infty A_n\), use induction to show finite unions \(\bigcup_{n=1}^N A_n \in \mathcal{A}\), then apply the definition of \(\mu^*\)-measurability to approximate \(\bigcup_{n=1}^\infty A_n\) by finite unions and take \(N \to \infty\).

Step 3: Restriction to \(\mathcal{A}\) is a measure

The restriction \(\mu = \mu^*|_{\mathcal{A}}\) is a measure. To verify countable additivity:

  • Let \(\{A_n\} \subseteq \mathcal{A}\) be pairwise disjoint. By countable subadditivity, \(\mu^*(\bigcup_n A_n) \leq \sum_n \mu^*(A_n)\).

  • For the reverse inequality, fix \(N \in \mathbb{N}\) and apply the Carathéodory condition iteratively to \(E = \bigcup_{n=1}^N A_n\) and \(A_{N+1}\), showing \(\mu^*(\bigcup_{n=1}^{N+1} A_n) = \sum_{n=1}^{N+1} \mu^*(A_n)\). Taking \(N \to \infty\) and using monotonicity gives \(\mu^*(\bigcup_n A_n) \geq \sum_n \mu^*(A_n)\).

Step 4: Extension property

For \(C \in \mathcal{C}\), we prove \(C\) is \(\mu^*\)-measurable and \(\mu^*(C) = \mu_0(C)\):

  • Measurability: For any \(E \subseteq \Omega\), let \(\{C_n\}\) be a cover of \(E\). Then \(C_n \cap C\) and \(C_n \setminus C\) belong to \(\mathcal{C}\) (since \(\mathcal{C}\) is an algebra), and \(\mu_0(C_n) = \mu_0(C_n \cap C) + \mu_0(C_n \setminus C)\). Summing over \(n\) gives \(\sum_n \mu_0(C_n) \geq \mu^*(E \cap C) + \mu^*(E \setminus C)\). Taking infima over covers yields \(\mu^*(E) \geq \mu^*(E \cap C) + \mu^*(E \setminus C)\).

  • Equality: By definition, \(\mu^*(C) \leq \mu_0(C)\). For the reverse inequality, suppose \(C \subseteq \bigcup_n C_n\). Then \(\mu_0(C) \leq \sum_n \mu_0(C_n \cap C)\) (by countable subadditivity of \(\mu_0\)) \(\leq \sum_n \mu_0(C_n)\). Taking infima gives \(\mu^*(C) \geq \mu_0(C)\).

Step 5: Uniqueness

If \(\mu_0\) is \(\sigma\)-finite, the extension is unique. Let \(\nu\) be another measure on \(\mathcal{A}\) with \(\nu|_{\mathcal{C}} = \mu_0\):

  • Apply the \(\pi\)-\(\lambda\) theorem:

    \(\mathcal{C}\) is a \(\pi\)-system (closed under finite intersections).

    • The set \(\{A \in \mathcal{A} : \mu(A) = \nu(A)\}\) is a \(\lambda\)-system.

    • Since \(\mu\) and \(\nu\) agree on \(\mathcal{C}\), they agree on \(\sigma(\mathcal{C})\).

  • \(\sigma\)-finiteness: Write \(\Omega = \bigcup_n \Omega_n\) with \(\mu_0(\Omega_n) < \infty\). For each \(A \in \mathcal{A}\), \(\mu(A \cap \Omega_n) = \nu(A \cap \Omega_n)\), so \(\mu(A) = \lim_{n \to \infty} \mu(A \cap \Omega_n) = \nu(A)\).

Remark

Remark. The Carathéodory extension theorem ensures that a pre-measure defined on an algebra can be uniquely extended to a measure on the generated \(\sigma\)-algebra, provided the pre-measure is \(\sigma\)-finite. In the proof of the existence of regular conditional probabilities, it guarantees that the finitely additive map \(f_A(\omega) = \mathbb{E}[1_A \mid \mathcal{A}_0](\omega)\), defined on a countable generator \(\mathcal{C}\), extends uniquely to a probability measure \(K_{\mathcal{A}_0}(\omega, \cdot)\) on \(\mathcal{A}\). This step is critical for constructing the regular conditional probability kernel rigorously.

Theorem 2 [4, Theorem 6.3]

For any Borel space \(S\) and measurable space \(T\), let \(\xi\) and \(\eta\) be random elements in \(S\) and \(T\), respectively. Then there exists a probability kernel \(\mu\) from \(T\) to \(S\) satisfying \[ \mathbb{P}[\xi \in \cdot \mid \eta] = \mu(\eta, \cdot) \text{ a.e. } \mathcal{L}(\eta), \] and \(\mu\) is unique a.e. \(\mathcal{L}(\eta)\).

Proof (Proof of Theorem 2). We may assume that \(S \in \mathcal{B}(\mathbb{R})\). For every \(r \in \mathbb{Q}\) we may choose some measurable function \(f_r = f(\cdot, r): T \to [0, 1]\) such that \[ f(\eta, r) = \mathbb{P}[\xi \leq r \mid \eta] \quad \text{a.e.}, \quad r \in \mathbb{Q}. \tag{1}\]

Let \(A\) be the set of all \(t\in T\) such that \(f(t, r)\) is nondecreasing in \(r \in \mathbb{Q}\) with limits 1 and 0 at \(\pm\infty\). Since \(A\) is specified by countably many measurable conditions, each of which holds a.e. at \(\eta\), we have \(A \in \mathcal{T}\) and \(\eta \in A\) a.e. Now define \[ F(t, x) = \mathbf{1}_A(t) \inf_{r > x} f(t, r) + \mathbf{1}_{A^c}(t) \mathbf{1}\{x \geq 0\}, \quad x \in \mathbb{R},\ t \in T, \] and note that \(F(t, \cdot)\) is a distribution function on \(\mathbb{R}\) for every \(t \in T\). Hence, by Proposition~\(\ref{prop:2.14}\) there exist some probability measures \(m(t, \cdot)\) on \(\mathbb{R}\) with \[ m(t, (-\infty, x]) = F(t, x), \quad x \in \mathbb{R},\ t \in T. \] The function \(F(t, x)\) is clearly measurable in \(t\) for each \(x\), and by a monotone class argument it follows that \(m\) is a kernel from \(T\) to \(\mathbb{R}\).

By Equation 1 and the monotone convergence property of \(\mathbb{E}^\eta\), we have \[ m(\eta, (-\infty, x]) = F(\eta, x) = \mathbb{P}[\xi \leq x \mid \eta] \quad \text{a.e.}, \quad x \in \mathbb{R}. \] Using a monotone class argument based on the a.e. monotone convergence property, we may extend the last relation to \[ m(\eta, B) = \mathbb{P}[\xi \in B \mid \eta] \quad \text{a.e.}, \quad B \in \mathcal{B}(\mathbb{R}). \tag{2}\]

In particular, we get \(m(\eta, S^c) = 0\) a.e., and so Equation 2 remains true on \(\mathcal{S} = \mathcal{B} \cap S\) with \(m\) replaced by the kernel \[ \mu(t, \cdot) = m(t, \cdot) \mathbf{1}\{m(t, S) = 1\} + \delta_s \mathbf{1}\{m(t, S) < 1\}, \quad t \in T, \] where \(s \in S\) is arbitrary. If \(\mu'\) is another kernel with the stated property, then \[ \mu(\eta, (-\infty, r]) = \mathbb{P}[\xi \leq r \mid \eta] = \mu'(\eta, (-\infty, r]) \quad \text{a.e.}, \quad r \in \mathbb{Q}, \] and a monotone class argument yields \(\mu(\eta, \cdot) = \mu'(\eta, \cdot)\) a.e.

Proof (Proof of Existence of Regular Conditional Probability). Let \((\Omega, \mathcal{A})\) be a standard Borel space and \(P\) a probability measure on \((\Omega, \mathcal{A})\). Let \(\mathcal{A}_0 \subseteq \mathcal{A}\) be a sub-\(\sigma\)-algebra.

Step 1: Countable Generator and Conditional Expectation

Since \((\Omega, \mathcal{A})\) is standard Borel, there exists a Polish topology on \(\Omega\) such that \(\mathcal{A}\) is the Borel \(\sigma\)-algebra. Standard Borel spaces have the property that every probability measure admits a regular conditional probability with respect to any sub-\(\sigma\)-algebra.

Let \(\mathcal{C}\) be a countable \(\pi\)-system generating \(\mathcal{A}\). For each \(A \in \mathcal{C}\), the conditional expectation \(\mathbb{E}[1_A \mid \mathcal{A}_0]\) exists as an \(\mathcal{A}_0\)-measurable function, unique up to \(P\)-null sets (by the definition of conditional expectation). By the Doob-Dynkin lemma (see Lemma 1), for each \(A \in \mathcal{C}\), there exists a measurable function \(f_A: \Omega \to [0,1]\) such that: \[ \mathbb{E}[1_A \mid \mathcal{A}_0] = f_A \circ \eta \quad \text{a.e. } P, \] where \(\eta\) is a measurable function generating \(\mathcal{A}_0\) (e.g., \(\eta(\omega) = \omega\)).

Step 2: Construction of the Kernel via Extension

For fixed \(\omega \in \Omega\), define \(f_A(\omega)\) for \(A \in \mathcal{C}\). The map \(A \mapsto f_A(\omega)\) is:

  • Finitely additive: For disjoint \(A_1, A_2 \in \mathcal{C}\), \(f_{A_1 \cup A_2}(\omega) = f_{A_1}(\omega) + f_{A_2}(\omega)\).

  • Non-negative: \(f_A(\omega) \geq 0\).

  • Normalized: \(f_\Omega(\omega) = 1\).

To extend \(f_A(\omega)\) to a probability measure on \((\Omega, \mathcal{A})\), we apply the Carathéodory extension theorem (see Theorem 1). However, since \(\mathcal{C}\) is a \(\pi\)-system, the extension is unique if \(f_A(\omega)\) is countably additive on \(\mathcal{C}\). This follows from the dominated convergence theorem and the fact that \(\mathcal{C}\) generates \(\mathcal{A}\).

Thus, for \(P\)-almost every \(\omega\), there exists a unique probability measure \(K_{\mathcal{A}_0}(\omega, \cdot)\) on \((\Omega, \mathcal{A})\) such that: \[ K_{\mathcal{A}_0}(\omega, A) = f_A(\omega) \quad \text{for all } A \in \mathcal{C}. \]

Step 3: Measurability of the Kernel

For each \(A \in \mathcal{A}\), the map \(\omega \mapsto K_{\mathcal{A}_0}(\omega, A)\) must be \(\mathcal{A}_0\)-measurable. Since \(\mathcal{C}\) generates \(\mathcal{A}\), we use the \(\pi\)-\(\lambda\) theorem: - Let \(\mathcal{L} = \{ A \in \mathcal{A} : K_{\mathcal{A}_0}(\cdot, A) \text{ is } \mathcal{A}_0\text{-measurable} \}\). - \(\mathcal{L}\) is a \(\lambda\)-system containing the \(\pi\)-system \(\mathcal{C}\), hence \(\mathcal{L} = \mathcal{A}\).

Step 4: Joint Measurability

By Theorem 2, there exists a probability kernel \(\mu\) from \(\mathcal{A}_0\) to \(\mathcal{A}\) such that: \[ \mathbb{P}[\xi \in \cdot \mid \eta] = \mu(\eta, \cdot) \quad \text{a.e. } \mathcal{L}(\eta), \] where \(\xi\) and \(\eta\) are random elements in \(\Omega\). Here, \(\mu(\eta(\omega), A) = K_{\mathcal{A}_0}(\omega, A)\) for \(P\)-almost every \(\omega\). The joint measurability of \((\omega, A) \mapsto K_{\mathcal{A}_0}(\omega, A)\) follows from the construction using a countable generator \(\mathcal{C}\) and the uniqueness of \(\mu\) a.e. \(\mathcal{L}(\eta)\).

Step 5: Verification of Conditional Probability

For all \(A \in \mathcal{A}\), \(K_{\mathcal{A}_0}(\omega, A)\) satisfies:

  • Measurability: \(K_{\mathcal{A}_0}(\cdot, A)\) is \(\mathcal{A}_0\)-measurable.

  • Integration: For any \(B \in \mathcal{A}_0\),

    \[ \int_B K_{\mathcal{A}_0}(\omega, A) \, dP(\omega) = P(A \cap B). \] This holds for \(A \in \mathcal{C}\) by construction and extends to all \(A \in \mathcal{A}\) via the \(\pi\)-\(\lambda\) theorem.

Thus, \(K_{\mathcal{A}_0}\) is a regular conditional probability kernel.

cf.[5] and [4]

Thus, a regular conditional probability exists for any sub-\(\sigma\)-algebra \(\mathcal{A}_0\) in a standard Borel space.

Remark

Remark. The standard Borel space assumption is essential. For general measurable spaces, regular conditional probabilities may not exist.

4 The Uniqueness of Regular Conditional Probability

Regular conditional probabilities are unique up to \(P\)-null sets. That is, if \(K_{\mathcal{A}_0}\) and \(\tilde{K}_{\mathcal{A}_0}\) are both regular conditional probabilities with respect to \(\mathcal{A}_0\), then there exists a \(P\)-null set \(N\) such that for all \(\omega \notin N\) and all \(A \in \mathcal{A}\), \[ K_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A). \]

This means the regular conditional probability is essentially unique: any two versions agree outside a set of probability zero.

Proof (Proof of Uniqueness of Regular Conditional Probability). Let \(K_{\mathcal{A}_0}\) and \(\tilde{K}_{\mathcal{A}_0}\) be two regular conditional probabilities with respect to \(\mathcal{A}_0\). For each \(A \in \mathcal{A}\), define \[ N_A = \{\omega \in \Omega : K_{\mathcal{A}_0}(\omega, A) \neq \tilde{K}_{\mathcal{A}_0}(\omega, A)\}. \]

Step 1: Null Sets for Countable Generator

By the definition of regular conditional probability, \(K_{\mathcal{A}_0}(\cdot, A)\) and \(\tilde{K}_{\mathcal{A}_0}(\cdot, A)\) are both versions of \(\mathbb{E}[1_A \mid \mathcal{A}_0]\), hence they are equal \(P\)-almost surely. Thus, \(P(N_A) = 0\) for each \(A \in \mathcal{A}\).

Since \(\mathcal{A}\) is standard Borel, let \(\mathcal{C}\) be a countable \(\pi\)-system generating \(\mathcal{A}\). Define the union of null sets: \[ N = \bigcup_{A \in \mathcal{C}} N_A. \]

As \(\mathcal{C}\) is countable, \(N\) is a countable union of \(P\)-null sets, so \(P(N) = 0\).

Step 2: Extension to the Entire \(\sigma\)-Algebra via \(\pi\)-\(\lambda\) Theorem

Fix \(\omega \notin N\). For this \(\omega\), define the collection of sets: \[ \mathcal{D}_\omega = \{ A \in \mathcal{A} : K_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A) \}. \] We show that \(\mathcal{D}_\omega\) is a \(\lambda\)-system containing \(\mathcal{C}\):

  • Contains \(\Omega\): \(K_{\mathcal{A}_0}(\omega, \Omega) = 1 = \tilde{K}_{\mathcal{A}_0}(\omega, \Omega)\), so \(\Omega \in \mathcal{D}_\omega\).

  • Closed under disjoint unions: If \(A_n \in \mathcal{D}_\omega\) are pairwise disjoint, then:

    \[ K_{\mathcal{A}_0}\left(\omega, \bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty K_{\mathcal{A}_0}(\omega, A_n) = \sum_{n=1}^\infty \tilde{K}_{\mathcal{A}_0}(\omega, A_n) = \tilde{K}_{\mathcal{A}_0}\left(\omega, \bigcup_{n=1}^\infty A_n\right). \]

  • Closed under complements: If \(A \in \mathcal{D}_\omega\), then:

    \[ K_{\mathcal{A}_0}(\omega, A^c) = 1 - K_{\mathcal{A}_0}(\omega, A) = 1 - \tilde{K}_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A^c). \]

Since \(\mathcal{C} \subseteq \mathcal{D}_\omega\) (by \(\omega \notin N\)) and \(\mathcal{C}\) is a \(\pi\)-system, the \(\pi\)-\(\lambda\) theorem implies \(\sigma(\mathcal{C}) = \mathcal{A} \subseteq \mathcal{D}_\omega\). Thus, for all \(\omega \notin N\), \(K_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A)\) for every \(A \in \mathcal{A}\).

The set \(N\) is \(P\)-null, and for all \(\omega \notin N\), the kernels \(K_{\mathcal{A}_0}(\omega, \cdot)\) and \(\tilde{K}_{\mathcal{A}_0}(\omega, \cdot)\) agree on \(\mathcal{A}\). Hence, regular conditional probabilities are unique up to \(P\)-null sets.

Remark

Remark. This uniqueness property ensures that, although the regular conditional probability may not be defined uniquely everywhere, any two versions coincide almost surely.

5 Representation of Conditional Expectation as an Integral

By now, we have established that conditional probability admits a refined version (the regular conditional probability), which qualifies as a probability measure. This allows us to define an integral with respect to it.

In the following, we aim to demonstrate that this integral operation on the regular conditional probability precisely coincides with the conditional expectation .

Theorem 3 (Representation of Conditional Expectation as an Integral) Let \((\Omega, \mathcal{A}, P)\) be a probability space, \(\mathcal{A}_0 \subseteq \mathcal{A}\) a sub-\(\sigma\)-algebra, and \(K(\omega, A)\) a regular conditional probability kernel such that \(K(\omega, A) = \mathbb{E}[1_A \mid \mathcal{A}_0](\omega)\) a.e. for all \(A \in \mathcal{A}\). Then for any integrable random variable \(X\), the conditional expectation satisfies: \[ \mathbb{E}[X \mid \mathcal{A}_0](\omega) = \int_\Omega X(\omega') K(\omega, d\omega') \quad \text{a.e. } P. \]

Proof (Proof of Theorem 3).

Step 1: Indicator Functions

Let \(X = 1_A\) for \(A \in \mathcal{A}\). By definition of the regular conditional probability: \[ \mathbb{E}[1_A \mid \mathcal{A}_0](\omega) = K(\omega, A) = \int_\Omega 1_A(\omega') K(\omega, d\omega') \quad \text{a.e. } P. \]

Step 2: Simple Functions

Let \(X = \sum_{i=1}^n a_i 1_{A_i}\) with \(A_i \in \mathcal{A}\) and \(a_i \in \mathbb{R}\). By linearity of conditional expectation and integration: \[ \mathbb{E}[X \mid \mathcal{A}_0](\omega) = \sum_{i=1}^n a_i \mathbb{E}[1_{A_i} \mid \mathcal{A}_0](\omega) = \sum_{i=1}^n a_i \int_\Omega 1_{A_i}(\omega') K(\omega, d\omega') = \int_\Omega X(\omega') K(\omega, d\omega'). \]

Step 3: Non-Negative Measurable Functions

Let \(X \geq 0\) be measurable. Take an increasing sequence of simple functions \(X_n \uparrow X\). By the monotone convergence theorem (MCT):

  • \(\mathbb{E}[X_n \mid \mathcal{A}_0] \uparrow \mathbb{E}[X \mid \mathcal{A}_0]\) a.e.

  • \(\int X_n K(\omega, d\omega') \uparrow \int X K(\omega, d\omega')\).

Thus: \[ \mathbb{E}[X \mid \mathcal{A}_0](\omega) = \lim_{n \to \infty} \mathbb{E}[X_n \mid \mathcal{A}_0](\omega) = \lim_{n \to \infty} \int X_n K(\omega, d\omega') = \int X K(\omega, d\omega') \quad \text{a.e. } P. \]

Step 4: General Integrable Functions

For arbitrary integrable \(X\), decompose \(X = X^+ - X^-\) with \(X^\pm \geq 0\). By Step 3: \[ \mathbb{E}[X \mid \mathcal{A}_0] = \mathbb{E}[X^+ \mid \mathcal{A}_0] - \mathbb{E}[X^- \mid \mathcal{A}_0] = \int X^+ K(\omega, d\omega') - \int X^- K(\omega, d\omega') = \int X K(\omega, d\omega') \quad \text{a.e. } P. \]

Step 5: Measurability and Uniqueness

  • Measurability: The integral \(\int X K(\omega, d\omega')\) is \(\mathcal{A}_0\)-measurable by construction of the kernel \(K\).

  • Uniqueness: Regular conditional probability kernels agree a.e. \(P\), ensuring the integral representation is unique a.e.

6 Conclusion

Through this assignment, we have systematically clarified the construction logic of Regular Conditional Probability and rigorously proved its existence and uniqueness in standard Borel spaces. This result demonstrates that, in measure spaces with well-behaved topological structures, conditional probability can be elevated to a probability kernel \(K_{\mathcal{A}_0}(\omega,A)\) dependent on sample points ω , thereby resolving the “null set selection problem” inherent in classical definitions of conditional probability (where properties depend on specific events A ). This conclusion provides a rigorous mathematical foundation for the integral representation of conditional expectation.

However, the construction of regular conditional probability heavily relies on the structural properties of the underlying space. If extended to general measurable spaces, the existence of such kernels may fail.

Acknowledgement

I sincerely thank Professor Zhu, Rongchan for her guidance during this semester’s course Selected Topics in Probability. I am deeply grateful to Liu, Chenhao and Song, Ke for their invaluable support during extracurricular discussions. Special thanks also go to Zhong, Xingyu (from the Class of 2022, Strong Foundation Program), for discussing with me about whether “the definitions of conditional probability and conditional expectation are consistent across different textbook versions,” which motivated my deeper reflection on the necessity of regularizing conditional probability.

Additionally, I thank Zhong, Xingyu for developing the open-source typesetting tool SunQuarTex , which enabled efficient and polished formatting of this document.

References

[1]
M. Röckner, Probability Theory I and II,” 2016.
[2]
李贤平, 概率论基础. 高等教育出版社, 1997.
[3]
韦来生, 数理统计. 科学出版社, 2015.
[4]
O. Kallenberg and O. Kallenberg, Foundations of modern probability, vol. 2. Springer, 2002.
[5]
V. I. Bogachev and M. A. S. Ruas, Measure theory, vol. 1. Springer, 2007.