Final Assignment of Selected Topics in Probability
The Existence and Uniqueness of Regular Conditional Probability
1 Foreword
During last semester’s study of Probability Theory (bilingual course), students at the Qiushi College were learning Probability Theory using the textbook [2] . This textbook does not introduce probability theory through the language of measure theory, nor does it cover the concept of conditional expectation. However, during a proof in Mathematical Statistics in the second week of this semester, conditional expectation appeared in the textbook [3] as an integral of conditional probability. This led me to hypothesize that the concept of conditional expectation mentioned in the textbook [3] must be the same as what we studied last semester [1]. After discussions with classmates, we failed to reach a satisfactory conclusion. It was not until the fourth week, during the Selected Topics in Probability, that the concept of regular conditional probability was briefly introduced. Unfortunately, [1] claimed that this topic had little relevance to the subsequent course content and thus avoided further discussion. Consequently, I decided to explore this topic as the focus of my final assignment.
2 Regular Conditional Probability
In our in-class discussion, we observed that conditional probability \[ P[A\mid \mathcal{A}_0]:=\mathbb{E}[\mathbb{I}_A\mid\mathcal{A}_0] \] satisfies \[ 0\leq P[A\mid\mathcal{A}_0]\leq 1 \]
\[ P[\emptyset\mid\mathcal{A}_0]=0;\ P[\Omega\mid\mathcal{A}_0]=1 \]
for \(A_n,\ n\in\mathbb{N}\), pairwise disjoint \[ P[\bigcup_{n=1}^{\infty}A_n\mid \mathcal{A}_0]=\sum_{n=1}^{\infty}P[A_n\mid \mathcal{A}_0] \] almost surely.
However, the null exceptional set depends on all of measurable set \(A\), and there exists no universal null set that makes the conditional probability be a probability measure. This motivates us to explore a more refined approach to characterize \(P[A\mid \mathcal{A}_0]\) in a stronger sense.
To obtain the desired “conditional probability”, we need to work within a better-behaved space, where such constructions can be rigorously defined.
3 The Existence of Regular Conditional Probability
Before proceeding with the proof, we first state and prove some lemmas and theorems that will be essential for the following arguments.
Proof (Proof of Lemma 1). Let \(f: \Omega \to \mathbb{R}\) be \(\sigma(X)\)-measurable, where \(X: \Omega \to S\) is measurable.
Step 1: Structure of \(\sigma(X)\)
The \(\sigma\)-algebra \(\sigma(X)\) generated by \(X\) is defined as: \[ \sigma(X) = \{ X^{-1}(B) : B \in \mathcal{S} \}. \] Since \(f\) is \(\sigma(X)\)-measurable, for every Borel set \(B \subseteq \mathbb{R}\), we have: \[ f^{-1}(B) \in \sigma(X). \] Thus, there exists a set \(A_B \in \mathcal{S}\) such that: \[ f^{-1}(B) = X^{-1}(A_B). \] This defines a mapping \(B \mapsto A_B\) from Borel sets in \(\mathbb{R}\) to \(\mathcal{S}\). To ensure consistency, this mapping must preserve set operations (e.g., unions, intersections, complements), which follows from the fact that \(f^{-1}\) and \(X^{-1}\) are both \(\sigma\)-homomorphisms.
Step 2: Constructing \(g\) via rational intervals
We construct \(g: S \to \mathbb{R}\) as follows. For each rational \(r \in \mathbb{Q}\), define: \[ A_r = A_{(-\infty, r]} \in \mathcal{S}, \] where \(A_{(-\infty, r]}\) corresponds to the set in \(\mathcal{S}\) such that \(f^{-1}((-\infty, r]) = X^{-1}(A_r)\).
For \(s \in S\), define: \[ g(s) = \inf \{ r \in \mathbb{Q} : s \in A_r \}. \] This infimum is well-defined because:
For any \(s \in S\), since \(f(\omega) \in \mathbb{R}\) for all \(\omega\), there exists some \(r \in \mathbb{Q}\) such that \(f(\omega) \leq r\), hence \(X(\omega) \in A_r\).
The set \(\{ r \in \mathbb{Q} : s \in A_r \}\) is bounded below (by, say, \(-\infty\)).
Step 3: Verifying \(f = g \circ X\) almost surely
Let \(\omega \in \Omega\). We claim that \(f(\omega) = g(X(\omega))\) except on a \(P\)-null set.
For \(f(\omega) \leq r\) with \(r \in \mathbb{Q}\):
If \(f(\omega) \leq r\), then \(\omega \in f^{-1}((-\infty, r]) = X^{-1}(A_r)\), so \(X(\omega) \in A_r\), implying \(g(X(\omega)) \leq r\).
Conversely, if \(X(\omega) \in A_r\), then \(g(X(\omega)) \leq r\), so \(f(\omega) \leq r\).
For \(f(\omega) \geq r\) with \(r \in \mathbb{Q}\):
- If \(f(\omega) > r\), then \(\omega \notin f^{-1}((-\infty, r]) = X^{-1}(A_r)\), so \(X(\omega) \notin A_r\), implying \(g(X(\omega)) > r\).
Thus, \(g(X(\omega)) \leq f(\omega)\) and \(g(X(\omega)) \geq f(\omega)\) hold for all \(\omega\) outside a null set where \(f(\omega)\) may not equal \(g(X(\omega))\). However, since \(f\) and \(g \circ X\) are measurable, the set \(\{ \omega : f(\omega) \neq g(X(\omega)) \}\) is measurable and has measure zero.
Step 4: Measurability of \(g\)
To show \(g\) is \(\mathcal{S}\)-measurable, we verify that for any \(a \in \mathbb{R}\), the set \(\{ s \in S : g(s) \leq a \}\) belongs to \(\mathcal{S}\).
For rational \(a\): \[ \{ s \in S : g(s) \leq a \} = \bigcap_{\substack{r \in \mathbb{Q} \\ r < a}} A_r^c \cup A_a. \] This follows from the definition of \(g\) as an infimum over rationals.
For general \(a \in \mathbb{R}\): Approximate \(a\) by a decreasing sequence of rationals \(\{r_n\}\). Then: \[ \{ s \in S : g(s) \leq a \} = \bigcap_{n=1}^\infty A_{r_n}. \] Since each \(A_{r_n} \in \mathcal{S}\), the countable intersection is also in \(\mathcal{S}\).
Hence, \(g\) is \(\mathcal{S}\)-measurable.
Step 5: Uniqueness up to null sets
If \(g'\) is another measurable function satisfying \(f = g' \circ X\) almost surely, then \(g(X(\omega)) = g'(X(\omega))\) for \(P\)-almost every \(\omega\). Since \(X\) is measurable, the pushforward measure \(P_X = P \circ X^{-1}\) ensures that \(g = g'\) almost surely with respect to \(P_X\).
We have constructed a measurable function \(g: S \to \mathbb{R}\) such that \(f = g \circ X\) almost surely, completing the proof.
Remark. The Doob-Dynkin lemma establishes that any \(\sigma(X)\)-measurable function \(f\) can be expressed as \(f = g \circ X\) for some measurable \(g\). In the context of regular conditional probabilities, it ensures that the conditional expectation \(\mathbb{E}[1_A \mid \mathcal{A}_0]\), being \(\mathcal{A}_0\)-measurable, can be represented as a function of a generating random variable \(\eta\) (e.g., \(\eta(\omega) = \omega\)). This allows the construction of the kernel \(K_{\mathcal{A}_0}(\omega, A)\) as a measurable function of \(\omega\), satisfying the required properties.
Proof (Proof of Theorem 1). Let \(\mathcal{C}\) be an algebra of subsets of \(\Omega\), and \(\mu_0: \mathcal{C} \to [0, \infty]\) a countably additive pre-measure.
Step 1: Outer measure construction
Define the outer measure \(\mu^*\) on all subsets \(E \subseteq \Omega\) by \[ \mu^*(E) = \inf\left\{ \sum_{n=1}^\infty \mu_0(C_n) : E \subseteq \bigcup_{n=1}^\infty C_n,\, C_n \in \mathcal{C} \right\}. \] We verify that \(\mu^*\) is an outer measure:
Monotonicity: If \(A \subseteq B\), then any cover of \(B\) is also a cover of \(A\), so \(\mu^*(A) \leq \mu^*(B)\).
Countable subadditivity: For any sequence \(\{A_n\}\), we construct covers \(\{C_{n,k}\}_{k=1}^\infty\) of \(A_n\) with \(\sum_{k=1}^\infty \mu_0(C_{n,k}) \leq \mu^*(A_n) + \epsilon/2^n\). The union \(\bigcup_{n,k} C_{n,k}\) covers \(\bigcup_n A_n\), and \(\sum_{n,k} \mu_0(C_{n,k}) \leq \sum_n \mu^*(A_n) + \epsilon\). Letting \(\epsilon \to 0\) gives \(\mu^*(\bigcup_n A_n) \leq \sum_n \mu^*(A_n)\).
Empty set: \(\mu^*(\emptyset) = 0\) since \(\emptyset \subseteq \emptyset\) and \(\mu_0(\emptyset) = 0\).
Step 2: Carathéodory measurability
A set \(A \subseteq \Omega\) is called \(\mu^*\)-measurable if for all \(E \subseteq \Omega\), \[ \mu^*(E) = \mu^*(E \cap A) + \mu^*(E \setminus A). \] Let \(\mathcal{A}\) be the collection of all \(\mu^*\)-measurable sets. We show \(\mathcal{A}\) is a \(\sigma\)-algebra:
Closed under complements: If \(A \in \mathcal{A}\), then \(A^c\) satisfies the same condition by symmetry.
Closed under countable unions: First, prove closure under finite unions by induction. For countable unions \(\bigcup_{n=1}^\infty A_n\), use induction to show finite unions \(\bigcup_{n=1}^N A_n \in \mathcal{A}\), then apply the definition of \(\mu^*\)-measurability to approximate \(\bigcup_{n=1}^\infty A_n\) by finite unions and take \(N \to \infty\).
Step 3: Restriction to \(\mathcal{A}\) is a measure
The restriction \(\mu = \mu^*|_{\mathcal{A}}\) is a measure. To verify countable additivity:
Let \(\{A_n\} \subseteq \mathcal{A}\) be pairwise disjoint. By countable subadditivity, \(\mu^*(\bigcup_n A_n) \leq \sum_n \mu^*(A_n)\).
For the reverse inequality, fix \(N \in \mathbb{N}\) and apply the Carathéodory condition iteratively to \(E = \bigcup_{n=1}^N A_n\) and \(A_{N+1}\), showing \(\mu^*(\bigcup_{n=1}^{N+1} A_n) = \sum_{n=1}^{N+1} \mu^*(A_n)\). Taking \(N \to \infty\) and using monotonicity gives \(\mu^*(\bigcup_n A_n) \geq \sum_n \mu^*(A_n)\).
Step 4: Extension property
For \(C \in \mathcal{C}\), we prove \(C\) is \(\mu^*\)-measurable and \(\mu^*(C) = \mu_0(C)\):
Measurability: For any \(E \subseteq \Omega\), let \(\{C_n\}\) be a cover of \(E\). Then \(C_n \cap C\) and \(C_n \setminus C\) belong to \(\mathcal{C}\) (since \(\mathcal{C}\) is an algebra), and \(\mu_0(C_n) = \mu_0(C_n \cap C) + \mu_0(C_n \setminus C)\). Summing over \(n\) gives \(\sum_n \mu_0(C_n) \geq \mu^*(E \cap C) + \mu^*(E \setminus C)\). Taking infima over covers yields \(\mu^*(E) \geq \mu^*(E \cap C) + \mu^*(E \setminus C)\).
Equality: By definition, \(\mu^*(C) \leq \mu_0(C)\). For the reverse inequality, suppose \(C \subseteq \bigcup_n C_n\). Then \(\mu_0(C) \leq \sum_n \mu_0(C_n \cap C)\) (by countable subadditivity of \(\mu_0\)) \(\leq \sum_n \mu_0(C_n)\). Taking infima gives \(\mu^*(C) \geq \mu_0(C)\).
Step 5: Uniqueness
If \(\mu_0\) is \(\sigma\)-finite, the extension is unique. Let \(\nu\) be another measure on \(\mathcal{A}\) with \(\nu|_{\mathcal{C}} = \mu_0\):
Apply the \(\pi\)-\(\lambda\) theorem:
\(\mathcal{C}\) is a \(\pi\)-system (closed under finite intersections).
The set \(\{A \in \mathcal{A} : \mu(A) = \nu(A)\}\) is a \(\lambda\)-system.
Since \(\mu\) and \(\nu\) agree on \(\mathcal{C}\), they agree on \(\sigma(\mathcal{C})\).
\(\sigma\)-finiteness: Write \(\Omega = \bigcup_n \Omega_n\) with \(\mu_0(\Omega_n) < \infty\). For each \(A \in \mathcal{A}\), \(\mu(A \cap \Omega_n) = \nu(A \cap \Omega_n)\), so \(\mu(A) = \lim_{n \to \infty} \mu(A \cap \Omega_n) = \nu(A)\).
Remark. The Carathéodory extension theorem ensures that a pre-measure defined on an algebra can be uniquely extended to a measure on the generated \(\sigma\)-algebra, provided the pre-measure is \(\sigma\)-finite. In the proof of the existence of regular conditional probabilities, it guarantees that the finitely additive map \(f_A(\omega) = \mathbb{E}[1_A \mid \mathcal{A}_0](\omega)\), defined on a countable generator \(\mathcal{C}\), extends uniquely to a probability measure \(K_{\mathcal{A}_0}(\omega, \cdot)\) on \(\mathcal{A}\). This step is critical for constructing the regular conditional probability kernel rigorously.
Proof (Proof of Theorem 2). We may assume that \(S \in \mathcal{B}(\mathbb{R})\). For every \(r \in \mathbb{Q}\) we may choose some measurable function \(f_r = f(\cdot, r): T \to [0, 1]\) such that \[ f(\eta, r) = \mathbb{P}[\xi \leq r \mid \eta] \quad \text{a.e.}, \quad r \in \mathbb{Q}. \tag{1}\]
Let \(A\) be the set of all \(t\in T\) such that \(f(t, r)\) is nondecreasing in \(r \in \mathbb{Q}\) with limits 1 and 0 at \(\pm\infty\). Since \(A\) is specified by countably many measurable conditions, each of which holds a.e. at \(\eta\), we have \(A \in \mathcal{T}\) and \(\eta \in A\) a.e. Now define \[ F(t, x) = \mathbf{1}_A(t) \inf_{r > x} f(t, r) + \mathbf{1}_{A^c}(t) \mathbf{1}\{x \geq 0\}, \quad x \in \mathbb{R},\ t \in T, \] and note that \(F(t, \cdot)\) is a distribution function on \(\mathbb{R}\) for every \(t \in T\). Hence, by Proposition~\(\ref{prop:2.14}\) there exist some probability measures \(m(t, \cdot)\) on \(\mathbb{R}\) with \[ m(t, (-\infty, x]) = F(t, x), \quad x \in \mathbb{R},\ t \in T. \] The function \(F(t, x)\) is clearly measurable in \(t\) for each \(x\), and by a monotone class argument it follows that \(m\) is a kernel from \(T\) to \(\mathbb{R}\).
By Equation 1 and the monotone convergence property of \(\mathbb{E}^\eta\), we have \[ m(\eta, (-\infty, x]) = F(\eta, x) = \mathbb{P}[\xi \leq x \mid \eta] \quad \text{a.e.}, \quad x \in \mathbb{R}. \] Using a monotone class argument based on the a.e. monotone convergence property, we may extend the last relation to \[ m(\eta, B) = \mathbb{P}[\xi \in B \mid \eta] \quad \text{a.e.}, \quad B \in \mathcal{B}(\mathbb{R}). \tag{2}\]
In particular, we get \(m(\eta, S^c) = 0\) a.e., and so Equation 2 remains true on \(\mathcal{S} = \mathcal{B} \cap S\) with \(m\) replaced by the kernel \[ \mu(t, \cdot) = m(t, \cdot) \mathbf{1}\{m(t, S) = 1\} + \delta_s \mathbf{1}\{m(t, S) < 1\}, \quad t \in T, \] where \(s \in S\) is arbitrary. If \(\mu'\) is another kernel with the stated property, then \[ \mu(\eta, (-\infty, r]) = \mathbb{P}[\xi \leq r \mid \eta] = \mu'(\eta, (-\infty, r]) \quad \text{a.e.}, \quad r \in \mathbb{Q}, \] and a monotone class argument yields \(\mu(\eta, \cdot) = \mu'(\eta, \cdot)\) a.e.
Proof (Proof of Existence of Regular Conditional Probability). Let \((\Omega, \mathcal{A})\) be a standard Borel space and \(P\) a probability measure on \((\Omega, \mathcal{A})\). Let \(\mathcal{A}_0 \subseteq \mathcal{A}\) be a sub-\(\sigma\)-algebra.
Step 1: Countable Generator and Conditional Expectation
Since \((\Omega, \mathcal{A})\) is standard Borel, there exists a Polish topology on \(\Omega\) such that \(\mathcal{A}\) is the Borel \(\sigma\)-algebra. Standard Borel spaces have the property that every probability measure admits a regular conditional probability with respect to any sub-\(\sigma\)-algebra.
Let \(\mathcal{C}\) be a countable \(\pi\)-system generating \(\mathcal{A}\). For each \(A \in \mathcal{C}\), the conditional expectation \(\mathbb{E}[1_A \mid \mathcal{A}_0]\) exists as an \(\mathcal{A}_0\)-measurable function, unique up to \(P\)-null sets (by the definition of conditional expectation). By the Doob-Dynkin lemma (see Lemma 1), for each \(A \in \mathcal{C}\), there exists a measurable function \(f_A: \Omega \to [0,1]\) such that: \[ \mathbb{E}[1_A \mid \mathcal{A}_0] = f_A \circ \eta \quad \text{a.e. } P, \] where \(\eta\) is a measurable function generating \(\mathcal{A}_0\) (e.g., \(\eta(\omega) = \omega\)).
Step 2: Construction of the Kernel via Extension
For fixed \(\omega \in \Omega\), define \(f_A(\omega)\) for \(A \in \mathcal{C}\). The map \(A \mapsto f_A(\omega)\) is:
Finitely additive: For disjoint \(A_1, A_2 \in \mathcal{C}\), \(f_{A_1 \cup A_2}(\omega) = f_{A_1}(\omega) + f_{A_2}(\omega)\).
Non-negative: \(f_A(\omega) \geq 0\).
Normalized: \(f_\Omega(\omega) = 1\).
To extend \(f_A(\omega)\) to a probability measure on \((\Omega, \mathcal{A})\), we apply the Carathéodory extension theorem (see Theorem 1). However, since \(\mathcal{C}\) is a \(\pi\)-system, the extension is unique if \(f_A(\omega)\) is countably additive on \(\mathcal{C}\). This follows from the dominated convergence theorem and the fact that \(\mathcal{C}\) generates \(\mathcal{A}\).
Thus, for \(P\)-almost every \(\omega\), there exists a unique probability measure \(K_{\mathcal{A}_0}(\omega, \cdot)\) on \((\Omega, \mathcal{A})\) such that: \[ K_{\mathcal{A}_0}(\omega, A) = f_A(\omega) \quad \text{for all } A \in \mathcal{C}. \]
Step 3: Measurability of the Kernel
For each \(A \in \mathcal{A}\), the map \(\omega \mapsto K_{\mathcal{A}_0}(\omega, A)\) must be \(\mathcal{A}_0\)-measurable. Since \(\mathcal{C}\) generates \(\mathcal{A}\), we use the \(\pi\)-\(\lambda\) theorem: - Let \(\mathcal{L} = \{ A \in \mathcal{A} : K_{\mathcal{A}_0}(\cdot, A) \text{ is } \mathcal{A}_0\text{-measurable} \}\). - \(\mathcal{L}\) is a \(\lambda\)-system containing the \(\pi\)-system \(\mathcal{C}\), hence \(\mathcal{L} = \mathcal{A}\).
Step 4: Joint Measurability
By Theorem 2, there exists a probability kernel \(\mu\) from \(\mathcal{A}_0\) to \(\mathcal{A}\) such that: \[ \mathbb{P}[\xi \in \cdot \mid \eta] = \mu(\eta, \cdot) \quad \text{a.e. } \mathcal{L}(\eta), \] where \(\xi\) and \(\eta\) are random elements in \(\Omega\). Here, \(\mu(\eta(\omega), A) = K_{\mathcal{A}_0}(\omega, A)\) for \(P\)-almost every \(\omega\). The joint measurability of \((\omega, A) \mapsto K_{\mathcal{A}_0}(\omega, A)\) follows from the construction using a countable generator \(\mathcal{C}\) and the uniqueness of \(\mu\) a.e. \(\mathcal{L}(\eta)\).
Step 5: Verification of Conditional Probability
For all \(A \in \mathcal{A}\), \(K_{\mathcal{A}_0}(\omega, A)\) satisfies:
Measurability: \(K_{\mathcal{A}_0}(\cdot, A)\) is \(\mathcal{A}_0\)-measurable.
Integration: For any \(B \in \mathcal{A}_0\),
\[ \int_B K_{\mathcal{A}_0}(\omega, A) \, dP(\omega) = P(A \cap B). \] This holds for \(A \in \mathcal{C}\) by construction and extends to all \(A \in \mathcal{A}\) via the \(\pi\)-\(\lambda\) theorem.
Thus, \(K_{\mathcal{A}_0}\) is a regular conditional probability kernel.
Thus, a regular conditional probability exists for any sub-\(\sigma\)-algebra \(\mathcal{A}_0\) in a standard Borel space.
Remark. The standard Borel space assumption is essential. For general measurable spaces, regular conditional probabilities may not exist.
4 The Uniqueness of Regular Conditional Probability
Regular conditional probabilities are unique up to \(P\)-null sets. That is, if \(K_{\mathcal{A}_0}\) and \(\tilde{K}_{\mathcal{A}_0}\) are both regular conditional probabilities with respect to \(\mathcal{A}_0\), then there exists a \(P\)-null set \(N\) such that for all \(\omega \notin N\) and all \(A \in \mathcal{A}\), \[ K_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A). \]
This means the regular conditional probability is essentially unique: any two versions agree outside a set of probability zero.
Proof (Proof of Uniqueness of Regular Conditional Probability). Let \(K_{\mathcal{A}_0}\) and \(\tilde{K}_{\mathcal{A}_0}\) be two regular conditional probabilities with respect to \(\mathcal{A}_0\). For each \(A \in \mathcal{A}\), define \[ N_A = \{\omega \in \Omega : K_{\mathcal{A}_0}(\omega, A) \neq \tilde{K}_{\mathcal{A}_0}(\omega, A)\}. \]
Step 1: Null Sets for Countable Generator
By the definition of regular conditional probability, \(K_{\mathcal{A}_0}(\cdot, A)\) and \(\tilde{K}_{\mathcal{A}_0}(\cdot, A)\) are both versions of \(\mathbb{E}[1_A \mid \mathcal{A}_0]\), hence they are equal \(P\)-almost surely. Thus, \(P(N_A) = 0\) for each \(A \in \mathcal{A}\).
Since \(\mathcal{A}\) is standard Borel, let \(\mathcal{C}\) be a countable \(\pi\)-system generating \(\mathcal{A}\). Define the union of null sets: \[ N = \bigcup_{A \in \mathcal{C}} N_A. \]
As \(\mathcal{C}\) is countable, \(N\) is a countable union of \(P\)-null sets, so \(P(N) = 0\).
Step 2: Extension to the Entire \(\sigma\)-Algebra via \(\pi\)-\(\lambda\) Theorem
Fix \(\omega \notin N\). For this \(\omega\), define the collection of sets: \[ \mathcal{D}_\omega = \{ A \in \mathcal{A} : K_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A) \}. \] We show that \(\mathcal{D}_\omega\) is a \(\lambda\)-system containing \(\mathcal{C}\):
Contains \(\Omega\): \(K_{\mathcal{A}_0}(\omega, \Omega) = 1 = \tilde{K}_{\mathcal{A}_0}(\omega, \Omega)\), so \(\Omega \in \mathcal{D}_\omega\).
Closed under disjoint unions: If \(A_n \in \mathcal{D}_\omega\) are pairwise disjoint, then:
\[ K_{\mathcal{A}_0}\left(\omega, \bigcup_{n=1}^\infty A_n\right) = \sum_{n=1}^\infty K_{\mathcal{A}_0}(\omega, A_n) = \sum_{n=1}^\infty \tilde{K}_{\mathcal{A}_0}(\omega, A_n) = \tilde{K}_{\mathcal{A}_0}\left(\omega, \bigcup_{n=1}^\infty A_n\right). \]
Closed under complements: If \(A \in \mathcal{D}_\omega\), then:
\[ K_{\mathcal{A}_0}(\omega, A^c) = 1 - K_{\mathcal{A}_0}(\omega, A) = 1 - \tilde{K}_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A^c). \]
Since \(\mathcal{C} \subseteq \mathcal{D}_\omega\) (by \(\omega \notin N\)) and \(\mathcal{C}\) is a \(\pi\)-system, the \(\pi\)-\(\lambda\) theorem implies \(\sigma(\mathcal{C}) = \mathcal{A} \subseteq \mathcal{D}_\omega\). Thus, for all \(\omega \notin N\), \(K_{\mathcal{A}_0}(\omega, A) = \tilde{K}_{\mathcal{A}_0}(\omega, A)\) for every \(A \in \mathcal{A}\).
The set \(N\) is \(P\)-null, and for all \(\omega \notin N\), the kernels \(K_{\mathcal{A}_0}(\omega, \cdot)\) and \(\tilde{K}_{\mathcal{A}_0}(\omega, \cdot)\) agree on \(\mathcal{A}\). Hence, regular conditional probabilities are unique up to \(P\)-null sets.
Remark. This uniqueness property ensures that, although the regular conditional probability may not be defined uniquely everywhere, any two versions coincide almost surely.
5 Representation of Conditional Expectation as an Integral
By now, we have established that conditional probability admits a refined version (the regular conditional probability), which qualifies as a probability measure. This allows us to define an integral with respect to it.
In the following, we aim to demonstrate that this integral operation on the regular conditional probability precisely coincides with the conditional expectation .
Proof (Proof of Theorem 3).
Step 1: Indicator Functions
Let \(X = 1_A\) for \(A \in \mathcal{A}\). By definition of the regular conditional probability: \[ \mathbb{E}[1_A \mid \mathcal{A}_0](\omega) = K(\omega, A) = \int_\Omega 1_A(\omega') K(\omega, d\omega') \quad \text{a.e. } P. \]
Step 2: Simple Functions
Let \(X = \sum_{i=1}^n a_i 1_{A_i}\) with \(A_i \in \mathcal{A}\) and \(a_i \in \mathbb{R}\). By linearity of conditional expectation and integration: \[ \mathbb{E}[X \mid \mathcal{A}_0](\omega) = \sum_{i=1}^n a_i \mathbb{E}[1_{A_i} \mid \mathcal{A}_0](\omega) = \sum_{i=1}^n a_i \int_\Omega 1_{A_i}(\omega') K(\omega, d\omega') = \int_\Omega X(\omega') K(\omega, d\omega'). \]
Step 3: Non-Negative Measurable Functions
Let \(X \geq 0\) be measurable. Take an increasing sequence of simple functions \(X_n \uparrow X\). By the monotone convergence theorem (MCT):
\(\mathbb{E}[X_n \mid \mathcal{A}_0] \uparrow \mathbb{E}[X \mid \mathcal{A}_0]\) a.e.
\(\int X_n K(\omega, d\omega') \uparrow \int X K(\omega, d\omega')\).
Thus: \[ \mathbb{E}[X \mid \mathcal{A}_0](\omega) = \lim_{n \to \infty} \mathbb{E}[X_n \mid \mathcal{A}_0](\omega) = \lim_{n \to \infty} \int X_n K(\omega, d\omega') = \int X K(\omega, d\omega') \quad \text{a.e. } P. \]
Step 4: General Integrable Functions
For arbitrary integrable \(X\), decompose \(X = X^+ - X^-\) with \(X^\pm \geq 0\). By Step 3: \[ \mathbb{E}[X \mid \mathcal{A}_0] = \mathbb{E}[X^+ \mid \mathcal{A}_0] - \mathbb{E}[X^- \mid \mathcal{A}_0] = \int X^+ K(\omega, d\omega') - \int X^- K(\omega, d\omega') = \int X K(\omega, d\omega') \quad \text{a.e. } P. \]
Step 5: Measurability and Uniqueness
Measurability: The integral \(\int X K(\omega, d\omega')\) is \(\mathcal{A}_0\)-measurable by construction of the kernel \(K\).
Uniqueness: Regular conditional probability kernels agree a.e. \(P\), ensuring the integral representation is unique a.e.
6 Conclusion
Through this assignment, we have systematically clarified the construction logic of Regular Conditional Probability and rigorously proved its existence and uniqueness in standard Borel spaces. This result demonstrates that, in measure spaces with well-behaved topological structures, conditional probability can be elevated to a probability kernel \(K_{\mathcal{A}_0}(\omega,A)\) dependent on sample points ω , thereby resolving the “null set selection problem” inherent in classical definitions of conditional probability (where properties depend on specific events A ). This conclusion provides a rigorous mathematical foundation for the integral representation of conditional expectation.
However, the construction of regular conditional probability heavily relies on the structural properties of the underlying space. If extended to general measurable spaces, the existence of such kernels may fail.
Acknowledgement
I sincerely thank Professor Zhu, Rongchan for her guidance during this semester’s course Selected Topics in Probability. I am deeply grateful to Liu, Chenhao and Song, Ke for their invaluable support during extracurricular discussions. Special thanks also go to Zhong, Xingyu (from the Class of 2022, Strong Foundation Program), for discussing with me about whether “the definitions of conditional probability and conditional expectation are consistent across different textbook versions,” which motivated my deeper reflection on the necessity of regularizing conditional probability.
Additionally, I thank Zhong, Xingyu for developing the open-source typesetting tool SunQuarTex , which enabled efficient and polished formatting of this document.