$$ \newcommand{\defeq}{\stackrel{\small\bullet}{=}} \newcommand{\ra}{\rangle} \newcommand{\la}{\langle} \newcommand{\norm}[1]{\left\|#1\right\|} \newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\Abs}[1]{\Bigl\lvert#1\Bigr\rvert} \newcommand{\pr}{{\mathbb P}} \newcommand{\qr}{{\mathbb Q}} \newcommand{\xv}{{\boldsymbol{x}}} \newcommand{\av}{{\boldsymbol{a}}} \newcommand{\bv}{{\boldsymbol{b}}} \newcommand{\cv}{{\boldsymbol{c}}} \newcommand{\dv}{{\boldsymbol{d}}} \newcommand{\ev}{{\boldsymbol{e}}} \newcommand{\fv}{{\boldsymbol{f}}} \newcommand{\gv}{{\boldsymbol{g}}} \newcommand{\hv}{{\boldsymbol{h}}} \newcommand{\nv}{{\boldsymbol{n}}} \newcommand{\sv}{{\boldsymbol{s}}} \newcommand{\tv}{{\boldsymbol{t}}} \newcommand{\uv}{{\boldsymbol{u}}} \newcommand{\vv}{{\boldsymbol{v}}} \newcommand{\wv}{{\boldsymbol{w}}} \newcommand{\zerov}{{\mathbf{0}}} \newcommand{\onev}{{\mathbf{0}}} \newcommand{\phiv}{{\boldsymbol{\phi}}} \newcommand{\cc}{{\check{C}}} \newcommand{\xv}{{\boldsymbol{x}}} \newcommand{\Xv}{{\boldsymbol{X}\!}} \newcommand{\yv}{{\boldsymbol{y}}} \newcommand{\Yv}{{\boldsymbol{Y}}} \newcommand{\zv}{{\boldsymbol{z}}} \newcommand{\Zv}{{\boldsymbol{Z}}} \newcommand{\Iv}{{\boldsymbol{I}}} \newcommand{\Jv}{{\boldsymbol{J}}} \newcommand{\Cv}{{\boldsymbol{C}}} \newcommand{\Ev}{{\boldsymbol{E}}} \newcommand{\Fv}{{\boldsymbol{F}}} \newcommand{\Gv}{{\boldsymbol{G}}} \newcommand{\Hv}{{\boldsymbol{H}}} \newcommand{\alphav}{{\boldsymbol{\alpha}}} \newcommand{\epsilonv}{{\boldsymbol{\epsilon}}} \newcommand{\betav}{{\boldsymbol{\beta}}} \newcommand{\deltav}{{\boldsymbol{\delta}}} \newcommand{\gammav}{{\boldsymbol{\gamma}}} \newcommand{\etav}{{\boldsymbol{\eta}}} \newcommand{\piv}{{\boldsymbol{\pi}}} \newcommand{\thetav}{{\boldsymbol{\theta}}} \newcommand{\tauv}{{\boldsymbol{\tau}}} \newcommand{\muv}{{\boldsymbol{\mu}}} \newcommand{\phiinv}{\Phi^{-1}} \newcommand{\Fiinv}{F^{-1}} \newcommand{\giinv}{g^{-1}} \newcommand{\fhat}{\hat{f}} \newcommand{\ghat}{\hat{g}} \newcommand{\ftheta}{f_\theta} \newcommand{\fthetav}{f_{\thetav}} \newcommand{\gtheta}{g_\theta} \newcommand{\gthetav}{g_{\thetav}} \newcommand{\ztheta}{Z_\theta} \newcommand{\xtheta}{\Xv_\theta} \newcommand{\ytheta}{\Yv_\theta} \newcommand{\p}{\partial} \newcommand{\f}{\frac} \newcommand{\cf}{\cfrac} \newcommand{\e}{\epsilon} \newcommand{\indep}{\perp\kern-5pt \perp} \newcommand{\inner}[1]{\langle#1\rangle} \newcommand{\pa}[1]{\left(#1\right)} \newcommand{\pb}[1]{\left\{#1\right\}} \newcommand{\pc}[1]{\left[#1\right]} \newcommand{\pA}[1]{\Big(#1\Big)} \newcommand{\pB}[1]{\Big\{#1\Big\}} \newcommand{\pC}[1]{\Big[#1\Big]} \newcommand{\ty}[1]{\texttt{#1}} \newcommand{\borel}[1]{\mathscr{B}\pa{#1}} \newcommand{\scr}{\mathcal} \newcommand{\scrb}{\mathscr} \newcommand{\argmin}{\mathop{\text{arg}\ \!\text{min}}} \newcommand{\arginf}{\mathop{\text{arg}\ \!\text{inf}}} \newcommand{\argmax}{\mathop{\text{arg}\ \!\text{max}}} \newcommand{\argsup}{\mathop{\text{arg}\ \!\text{sup}}} \newcommand{\bigo}[1]{\mathcal{O}_{p}\!\left(#1\right)} \newcommand{\f}{\frac} \newcommand{\e}{\epsilon} \newcommand{\inv}{^{-1}} \newcommand{\phiinv}{\Phi^{-1}} \newcommand{\Fiinv}{F^{-1}} \newcommand{\giinv}{g^{-1}} \newcommand{\fhat}{\hat{f}} \newcommand{\ghat}{\hat{g}} \newcommand{\ftheta}{f_\theta} \newcommand{\fthetav}{f_{\thetav}} \newcommand{\gtheta}{g_\theta} \newcommand{\gthetav}{g_{\thetav}} \newcommand{\ztheta}{Z_\theta} \newcommand{\xtheta}{\Xv_\theta} \newcommand{\ytheta}{\Yv_\theta} \newcommand{\absdet}[1]{\abs{\det\pa{#1}}} \newcommand{\jac}[1]{\Jv_{#1}} \newcommand{\absdetjx}[1]{\abs{\det\pa{\Jv_{#1}}}} \newcommand{\absdetj}[1]{\norm{\Jv_{#1}}} \newcommand{\sint}{sin(\theta)} \newcommand{\cost}{cos(\theta)} \newcommand{\sor}[1]{S\mathcal{O}(#1)} \newcommand{\ort}[1]{\mathcal{O}(#1)} \newcommand{\A}{{\mathcal A}} \newcommand{\C}{{\mathbb C}} \newcommand{\E}{{\mathbb E}} \newcommand{\F}{{\mathcal{F}}} \newcommand{\N}{{\mathbb N}} \newcommand{\R}{{\mathbb R}} \newcommand{\Q}{{\mathbb Q}} \newcommand{\Z}{{\mathbb Z}} \newcommand{\X}{{\mathbb{X}}} \newcommand{\Y}{{\mathbb{Y}}} \newcommand{\G}{{\mathcal{G}}} \newcommand{\M}{{\mathcal{M}}} \newcommand{\betaequivalent}{\beta\text{-equivalent}} \newcommand{\betaequivalence}{\beta\text{-equivalence}} \newcommand{\Mb}{{\boldsymbol{\mathsf{M}}}} \newcommand{\Br}{{\mathbf{\mathsf{Bar}}}} \newcommand{\dgm}{{\mathfrak{Dgm}}} \newcommand{\Db}{{\mathbf{\mathsf{D}}}} \newcommand{\Img}{{\mathbf{\mathsf{Img}}}} \newcommand{\mmd}{{\mathbf{\mathsf{MMD}}}} \newcommand{\Xn}{{\mathbb{X}_n}} \newcommand{\Xm}{{\mathbb{X}_m}} \newcommand{\Yn}{{\mathbb{Y}_n}} \newcommand{\Ym}{Y_1, Y_2, \cdots, Y_m} \newcommand{\Xb}{{\mathbb{X}}} \newcommand{\Yb}{{\mathbb{Y}}} \newcommand{\s}{{{\sigma}}} \newcommand{\fnsbar}{{\bar{f}^n_\s}} \newcommand{\fns}{{f^n_\s}} \newcommand{\fs}{{f_\s}} \newcommand{\fsbar}{{\bar{f}_\s}} \newcommand{\barfn}{{{f}^n_\sigma}} \newcommand{\barfnm}{{{f}^{n+m}_\sigma}} \newcommand{\barfo}{{{f}_\sigma}} \newcommand{\fn}{{f^n_{\rho,\sigma}}} \newcommand{\fnm}{{f^{n+m}_{\rho,\sigma}}} \newcommand{\fo}{{f_{\rho,\sigma}}} \newcommand{\K}{{{K_{\sigma}}}} \newcommand{\barpn}{{\bar{p}^n_\sigma}} \newcommand{\barpo}{{\bar{p}_\sigma}} \newcommand{\pn}{{p^n_\sigma}} \newcommand{\po}{{p_\sigma}} \newcommand{\J}{{\mathcal{J}}} \newcommand{\B}{{\mathcal{B}}} \newcommand{\pt}{{\tilde{\mathbb{P}}}} \newcommand{\Winf}{{W_{\infty}}} \newcommand{\winf}{{W_{\infty}}} \newcommand{\HH}{{{\scr{H}_{\sigma}}}} \newcommand{\D}{{{\scr{D}_{\sigma}}}} \newcommand{\Ts}{{T_{\sigma}}} \newcommand{\Phis}{{\Phi_{\sigma}}} \newcommand{\nus}{{\nu_{\sigma}}} \newcommand{\Qs}{{\mathcal{Q}_{\sigma}}} \newcommand{\ws}{{w_{\sigma}}} \newcommand{\vs}{{v_{\sigma}}} \newcommand{\ds}{{\delta_{\sigma}}} \newcommand{\fp}{{f_{\pr}}} \newcommand{\prs}{{\widetilde{\pr}_{\sigma}}} \newcommand{\qrs}{{\widetilde{\qr}_{\sigma}}} \newcommand{\Inner}[1]{\Bigl\langle#1\Bigr\rangle} \newcommand{\innerh}[1]{\langle#1\rangle_{\HH}} \newcommand{\Innerh}[1]{\Bigl\langle#1\Bigr\rangle_{\HH}} \newcommand{\normh}[1]{\norm{#1}_{\HH}} \newcommand{\norminf}[1]{\norm{#1}_{\infty}} \newcommand{\gdelta}{{\G_{\delta}}} \newcommand{\supgdelta}{{\sup\limits_{g\in\gdelta}\abs{\Delta_n(g)}}} \newcommand{\id}{\text{id}} \newcommand{\supp}{\text{supp}} \newcommand{\cech}{\v{C}ech} \newcommand{\Zz}{{\scr{Z}}} \newcommand{\psis}{\psi_\s} \newcommand{\phigox}{\Phis(\xv)-g} \newcommand{\phigoy}{\Phis(\yv)-g} \newcommand{\fox}{{f^{\epsilon,{\xv}}_{\rho,\sigma}}} \newcommand{\prx}{{\pr^{\epsilon}_{\xv}}} \newcommand{\pro}{{\pr_0}} \newcommand{\dotfo}{\dot{f}_{\!\!\rho,\s}} \newcommand{\phifo}{{\Phis(\yv)-\fo}} \newcommand{\phifox}{{\Phis(\xv)-\fo}} \newcommand{\kinf}{{\norm{\K}_{\infty}}} \newcommand{\half}{{{\f{1}{2}}}} \newcommand{\Jx}{\J_{\epsilon,{\xv}}} \newcommand{\dpy}{\text{differential privacy}} \newcommand{\edpy}{$\epsilon$--\text{differential privacy}} \newcommand{\eedpy}{$\epsilon$--edge \text{differential privacy}} \newcommand{\dpe}{\text{differentially private}} \newcommand{\edpe}{$\epsilon$--\text{differentially private}} \newcommand{\eedpe}{$\epsilon$--edge \text{differentially private}} \newcommand{\er}{Erdős-Rényi} \newcommand{\krein}{Kreĭn} % \newcommand{\grdpg}{\mathsf{gRDPG}} % \newcommand{\rdpg}{\mathsf{RDPG}} % \newcommand{\eflip}{{\textsf{edgeFlip}}} % \newcommand{\grdpg}{\text{gRDPG}} % \newcommand{\rdpg}{\text{RDPG}} \newcommand{\grdpg}{\mathsf{gRDPG}} \newcommand{\rdpg}{\mathsf{RDPG}} \newcommand{\eflip}{{\text{edgeFlip}}} \newcommand{\I}{{\mathbb I}} \renewcommand{\pa}[1]{\left(#1\right)} \renewcommand{\pb}[1]{\left\{#1\right\}} \renewcommand{\pc}[1]{\left[#1\right]} \renewcommand{\V}{\mathbb{V}} \renewcommand{\W}{\mathbb{W}} %%%%%%%%%%%%%%%%%%%%%%%%%%% \providecommand{\fd}{\frac 1d} % \renewcommand{\fpp}{{\frac 1p}} \providecommand{\pfac}{\f{p}{p-1}} \providecommand{\ipfac}{\f{p-1}{p}} \providecommand{\dbq}{\Delta b_{n,m,Q}\qty(\qty{\xvo})} \providecommand{\db}{\Delta b_{n,m}\qty(\qty{\xvo})} \providecommand{\bbv}{{{\mathbb{V}}}} \providecommand{\bbw}{{{\mathbb{W}}}} \providecommand{\md}{\textsf{MoM Dist}} \providecommand{\bF}{{\mathbf{F}}} \providecommand{\sub}{{\text{Sub}}} \providecommand{\samp}{\text{$\pa{\scr{S}}$}} \providecommand{\tp}{{2^{\f{p-1}{p}}}} %%%%%%%%%%%%%%%%%%%%%%%%%% \providecommand{\Xmn}{{\mathbb{X}_{n+m}}} \newcommand{\Dnmq}{\D[n+m, Q]} \newcommand{\Dnmh}{\D[n+m, \H]} \newcommand{\Dn}{\D[n]} \providecommand{\xvo}{\xv_0} \providecommand{\bn}[1][\null]{b^{#1}_{n}\pa{\pb{\xvo}}} \providecommand{\bnm}[1][\null]{b^{#1}_{n+m}\pa{\pb{\xvo}}} \providecommand{\bnq}[1][\null]{b^{#1}_{n,Q}\pa{\pb{\xvo}}} \providecommand{\bnmq}[1][\null]{b^{#1}_{n+m,Q}\pa{\pb{\xvo}}}\providecommand{\prq}{\pr_q} \providecommand{\dxvo}{{\delta_{\xvo}}} \providecommand{\sq}{S_q} \providecommand{\Sq}{\abs{S_q}} \providecommand{\no}{{n_o}} \providecommand{\mmdn}{\mmd\pa{\pr_n, \delta_{\xvo}}} \newcommand{\rqt}{\xi_{q}(t; n, Q)} \providecommand{\nq}{\f{n}{Q}} \providecommand{\Ot}{\Omega(t, n/Q)} \providecommand{\ut}[1]{U^{#1}} \providecommand{\vt}[1]{V^{#1}} \providecommand{\wt}[1]{W^{#1}} \providecommand{\but}[1]{\mathbb{U}^{#1}} \providecommand{\bvt}[1]{\mathbb{V}^{#1}} \providecommand{\bwt}[1]{\mathbb{W}^{#1}} \providecommand{\ball}[1]{B_{f\!, \rho}\pa{#1}} \newcommand*{\medcap}{\mathbin{\scalebox{0.75}{{\bigcap}}}}% \newcommand*{\medcup}{\mathbin{\scalebox{0.75}{{\bigcup}}}}% \providecommand{\dsf}{\mathsf{d}} \newcommand{\Dnh}{{\mathsf{D}_{n,\scr{H}}}} \newcommand{\Dph}{{\mathsf{D}_{\pr,\scr{H}}}} \newcommand{\D}[1][1={ },usedefault]{{\mathsf{D}_{#1}}} \newcommand{\Dnq}{{\mathsf{D}_{n, Q}}} \newcommand{\dnq}{{\mathsf{d}_{n, Q}}} \newcommand{\dn}{{\mathsf{d}_{n}}} \newcommand{\dnm}{{\mathsf{d}_{n-m}}} \newcommand{\dmn}{{\mathsf{d}_{n+m}}} \newcommand{\dx}{{\mathsf{d}_{\mathbb{X}}}} \providecommand{\med}{\text{median}} \providecommand{\median}{\text{median}} \providecommand{\Xnm}{{\mathbb{X}^*_{n-m}}} $$

Week-3

Math 183 • Statistical Methods • Spring 2026

Siddharth Vishwanath

Learning objectives

$$ % % % % % % % %%%%%%%%%%%%%%%%%%%%%%%%%%% % %%%%%%%%%%%%%%%%%%%%%%%%%% % %

% % \providecommand{}{p_{0}} \providecommand{}{p_{1}} \providecommand{}{p_{2}} \providecommand{}{p_{12}} \providecommand{1n}{p_{1n}} % % % % $$

  • Conditional Probability
  • Random variables
  • Concept of discrete vs. continuous random variables

Random Variables

(Random) Variable

Random variable

A random variable is a variable which assumes values based on the outcome of a trial from a random phenomenon.

Tip

Think of a random variable as a placeholder for the different outcomes we can witness from a trial.

Notation

Random variables are usually denoted by capital letters (e.g., \(X, Y, Z, V, W\)) from the end of the alphabet.

Support of a random variable

Support

The support of a random variable is the universe of all possible values a random variable can assume.

Types of random variables

Since a random variable is simply a “placeholder” for the values a variable can take … random variables are also divided into groups based on its support.

Recall the main types of variables from Week-1:

  • Qualitative
    • Nominal/Categorical
    • Ordinal
  • Quantitative
    • Discrete
    • Continuous

Coin toss

You flip a coin. The random variable, \(X\), is a placeholder for the outcome of the coin toss.

1) The support of the random variable is \(\text{supp}(X) = \left\{H, T\right\}\), i.e., \[\left\{X=H\right\} \quad\text{or}\quad \left\{X=T\right\}\]
2) This is an example of a nominal categorical random variable

Computer-friendly Convention

Usually, a discrete outcome is “encoded” as {0, 1}, i.e.,
\(X=1\) is understood to be that the outcome is \(H\), and
\(X=0\) is understood to be that the outcome is \(T\).

For categorical variables, this usually evident from the problem (but not always ⚠️)

This class

You’re bored in your Math 183 class and you’ve also just had lunch. To keep yourself entertained you decide to count how many times you yawn. If \(X\) denotes this random variable:

1) The support of the random variable is \(\text{supp}(X) = \left\{1, 2, 3, \dots\right\}\), i.e., \[\left\{X=1\right\}, \quad\text{or} \left\{X=2\right\},\quad\text{or}\dots\quad\text{or} \left\{X=100\right\}, \quad \dots\] 2) This is an example of a discrete quantitative random variable

BMI

We pick a student at random and ask them what their BMI is (weight ÷ height\(^2\)). Let \(X\) denote this random variable:

1) The support of the random variable is \(\text{supp}(X) = {\mathbb R}\), i.e., \[\left\{X=x\right\}, \quad\text{for any}\quad x \in {\mathbb R}\] 2) This is an example of a continuous quantitive random variable

Anatomy of a random variable

Every random variable has:

  1. A mathematical symbol representing it
    • e.g., \(X, Y, Z\)
  1. A support, \(\text{supp}(X)\)

    • This determines the nature of the random variable
  1. A probability distribution \({\mathbb P}_X\).

    • This determines the probability of the random variable taking a specific set of values in its support
  1. Measures of central tendency and dispersion.

    • The measure of central tendency is called its expectation \({\mathbb E}(X)\)
    • The measure of dispersion is called its variance \({\text{Var}}(X)\)

Part-I

Discrete Random Variables

Probability Distribution

Probability Distribution

A probability distribution, \({\mathbb P}\), associated with a random variable \(X\) describes the probability with \(X\) can take on the possible values in its support.

Tip

A probability distribution is the minimum amount of information you need in order to determine the probabilities for all possible events you can create from a random variable

Example

Suppose the random variable \(X\) denotes a randomly chosen student’s favorite primary color.

\(\text{supp}(X)\) Red Blue Green
\({\mathbb P}(\left\{X=x\right\})\) 0.20 0.55 0.25

You can use this information to find, for example,

\[ {\mathbb P}\left(\left\{X = \text{Red or Green}\right\}\right) = {\mathbb P}\left(\left\{X = \text{Red}\right\} \cup \left\{X = \text{Green}\right\}\right) \]

Probability Mass Function

For a discrete random variable \(X\), the assignment \[x \mapsto {\mathbb P}(X=x) =: \color{red}{p(x)}\] for every \(x \in \text{supp}(X)\) is called its probability mass function.

Visualizing PMFs

\(\text{supp}(X)\) Red Blue Green
\({\mathbb P}(\left\{X=x\right\})\) 0.20 0.55 0.25

Properties of PMFs

  1. The PMF is between \(0\) and \(1\), i.e., for any \(x \in \text{supp}(X)\) \[ 0 \le p(x) \le 1 \]

  1. The sum of the PMF over the support is \(1\), i.e., \[ \sum_{x \in \text{supp}(X)} p(x) = 1 \]

  1. The probability that \(X \neq x\) is the complement of the probability that \(X = x\), i.e., \[ {\mathbb P}(\{X \neq x\}) = 1 - p(x) \]

  1. For \(x, y \in \text{supp}(X)\), if \(x \neq y\), then \[ {\mathbb P}(\{X=x\} \cup \{X=y\}) = p(x) + p(y) \]

Conditional Probability Mass Function

  • Let \(X\) be a random variable with PMF \(p_{X}(x) = {\mathbb P}(X=x) \quad \text{for all } x \in \text{supp}(X)\)
  • Let \(A\) be some event

Conditional Probability Mass Function

The distribution of \(X\) conditional on the event \(A\), denoted \(X|A\), denoted \(p_{X}(x|A)\), is given by the conditional probability mass function \[ p_{X}(x | A) = {\mathbb P}(X=x | A) = {\mathbb P}(\left\{X=x\right\} | A) \]

More than one Random Variable

  • Let \(X\) be a random variable with PMF \(p_{X}(x) = {\mathbb P}(X=x) \quad \text{for all } x \in \text{supp}(X)\)
  • Let \(Y\) be a random variable with PMF \(p_{Y}(y) = {\mathbb P}(Y=y) \quad \text{for all } y \in \text{supp}(Y)\)

Joint Probability Mass Function

The distribution of \(X\) and \(Y\) is given by the joint PMF \[ p_{X,Y}(x,y) = {\mathbb P}(X=x, Y=y) = {\mathbb P}(\left\{X=x\right\} \cap \left\{Y=y\right\}) \] and conditioned on the event \(\left\{Y = y\right\}\) the conditional PMF of \(X | \left\{Y=y\right\}\) is \[ p_{X|Y}(x | y) = {\mathbb P}(X=x | Y=y) = {\mathbb P}\left(\left\{X=x\right\} | \left\{Y=y\right\}\right) \]

  • \(X\) and \(Y\) are said to be independent, \(X \perp\kern-5pt \perp Y\), if \(p_{X|Y}(x|y) = p_{X}(x)\)

Conditional Probability Mass Function

The distribution of \(X\) is given by the joint probability mass function \(p_{X,Y}(x, y)\) \[ p_{X,Y}(x,y) = {\mathbb P}(X=x, Y=y) = {\mathbb P}(\left\{X=x\right\} \cap \left\{Y=y\right\}) \]

More than one random variable (cont’d)

  • If \(X \perp\kern-5pt \perp Y\): \[\begin{aligned} p_{X,Y}(x, y) &= {\mathbb P}(\left\{X=x\right\} \cap \left\{Y=y\right\})\\ &= {\mathbb P}(\left\{X=x\right\}) \times {\mathbb P}(\left\{Y=y\right\})\\ &= p_{X}(x) \times p_{Y}(y) \end{aligned}\]

  • If \(X \not\perp\kern-5pt \perp Y\): \[\begin{aligned} p_{X,Y}(x, y) &= {\mathbb P}(\left\{X=x\right\} \cap \left\{Y=y\right\})\\ &= {\mathbb P}(\left\{X=x\right\} | \left\{Y=y\right\}) {\mathbb P}(\left\{Y=y\right\}) \quad\quad\text{recall week-2}\\ &= p_{X|Y}(x | y) p_{Y}(y)\\ \end{aligned}\]


\[ p_{X,Y}(x, y) = \begin{cases} p_{X}(x) \times p_{Y}(y) & X \perp\kern-5pt \perp Y\\ p_{X|Y}(x | y) \times p_{Y}(y) & X \not\perp\kern-5pt \perp Y \end{cases} \]

Discrete & Quantitative Random Variables

Cumulative Distribution Function

  • Given a quantitative random variable \(X\) with PMF \(p_{X}(x)\)

  • The Cumulative Distribution Function (CDF) of a discrete random variable \(X\) is defined as:

\[F_X(x) = P(X ≤ x)\]

  • Essentially, the CDF at a point x is the probability that the random variable X takes a value less than or equal to x.

  • For discrete random variables, the CDF is a step function.

Properties of CDFs

  1. Normalized: \[\lim_{x \rightarrow -\infty}F(x) = 0 \quad \text{and}\quad \lim_{x \rightarrow \infty}F(x) = 1\]
  1. Non-decreasing function: The function \(F_X(x)\) is non-decreasing, i.e., if

\[a ≤ b \quad \text{then} \quad F(a) ≤ F(b)\]

  1. Relationship with PMFs: The sum of all probabilities less than \(x\) is the CDF, i.e., \[ F_X(x) = \sum_{y \in \text{supp}(X) \text{ such that } y \le x} p_{X}(y) \]

Example

#| standalone: true
#| viewerHeight: 600
#| components: viewer
#| layout: vertical

import numpy as np
import scipy
from scipy import stats
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
from shiny import App, render, ui
from matplotlib.patches import Patch

# Generate a random sample
np.random.seed(0)
sample_data = np.random.randint(-5, 7, 1000)
sample_data = np.array([*np.random.randint(-3, 2, 1000), *sample_data])
sample_data = np.array([*np.random.randint(3, 5, 1000), *sample_data])

# Define the UI
app_ui = ui.page_fluid(
    ui.layout_sidebar(
        ui.sidebar(
            ui.input_slider(
                "x_val",
                "x",
                min = float(np.floor(sample_data.min())-0.5),
                max = float(np.ceil(sample_data.max())+0.5),
                value = 0.0,
                step = 0.5,
                ticks=True,
                animate=False
            )
        ),
        ui.output_plot("plots", height="500px")
    )
)

# Define the server logic
def server(input, output, session):
    @output
    @render.plot
    def plots():
        x = input.x_val()
        
        # Create figure and axes
        fig, axes = plt.subplots(1, 2, figsize=(12, 5))
        
        # Histogram Plot
        ax_hist = axes[0]
        bins = 11
        counts, bins, patches = ax_hist.hist(sample_data, bins=bins, color='lightgrey', edgecolor='black', density=True, align='left', width=0.5)
        
        # Color the bars <= x in blue
        for patch, bin_edge in zip(patches, bins):
            if bin_edge <= x:
                patch.set_facecolor('dodgerblue')
        
        # Add vertical line at x
        ax_hist.axvline(x, color='red', linewidth=1, alpha=0.5)
        ax_hist.set_xticks(np.arange(-5, 6, 1))
        ax_hist.set_title("Probability Mass Function (PMF)")
        ax_hist.set_xlabel("Support")
        ax_hist.set_ylabel("p(x)")
        ax_hist.set_ylim(0, 1.0)
        
        # CDF Plot
        ax_cdf = axes[1]
        Xrange = np.linspace(-5.5, 5.5, 200)
        ecdf = lambda x: np.sum(sample_data <= x) / len(sample_data)
        Y = [ecdf(x) for x in Xrange]
        ax_cdf.plot(Xrange, Y, color='dodgerblue')
        ax_cdf.axhline(ecdf(x), color='red', linewidth=1, linestyle='--', alpha=0.25)
        ax_cdf.axvline(x, ymax=ecdf(x), color='red', linewidth=0.25, alpha=0.25)
        # Add vertical line at x
        ax_cdf.axvline(x, color='red', linewidth=2)
        ax_cdf.set_title("Cumulative Distribution Function (CDF)")
        ax_cdf.set_xlabel("Support")
        ax_cdf.set_xticks(np.arange(-5, 6, 1))
        ax_cdf.set_ylabel("F(x)")
        ax_cdf.set_ylim(-0.05, 1.05)
        ax_cdf.set_xlim(-5.5, 5.5)
        plt.tight_layout()
        return fig

# Create the Shiny app
app = App(app_ui, server)
app

Central Tendency of a random variable

You’ve just made an amazing (in your opinion) TikTok short which has a potential for going viral.

  • With probability \(0.1\), the TikTok will really go viral, leading to \(10,000\) additional followers

  • With probability \(0.9\), your TikTok is not as good as you thought it was, and it leads to \(0\) additional followers.

How many additional followers do you expect to have after you post your TikTok?

Expectation

Expectation

The expectation of a (quantitative) random variable \(X\) is a weighted-average of all possible outcomes of \(X\), weighted by the probability of each outcome over its support.

\[\begin{aligned} {\mathbb E}(X) &= \sum_{x \in \text{supp}(X)} x \times {\mathbb P}(X=x)\\ &= \sum_{x \in \text{supp}(X)} x \times p_{X}(x) \end{aligned}\]

Example revisited

Let \(X\) be the random variable representing the number of additional followers you get on TikTok.

\(\text{supp}(X)\) \(10,000\) (viral) \(0\) (not viral)
\(p(x)\) 0.1 0.9

\[\begin{aligned} {\mathbb E}(X) &= \sum_{x \in \text{supp}(X)} x \times {\mathbb P}(X=x)\\ \\ &= (0.1 \times 10,000) + (0.9 \times 0)\\ \\ \therefore {\mathbb E}(X) &= 1,000. \end{aligned}\]

Linearity

For two random variables \(X\) and \(Y\), and for two constants \(a, b \in {\mathbb R}\), \[ {\mathbb E}(aX + bY) = a {\mathbb E}(X) + b {\mathbb E}(Y) \]

Why?

\[ \begin{aligned} {\mathbb E}(aX + bY) &= \sum_{x \in \text{supp}(X)}\sum_{ y \in \text{supp}(Y)} (ax + by) \cdot p_{X,Y}(x, y)\\ &= a\sum_{x \in \text{supp}(X)}\sum_{ y \in \text{supp}(Y)} x \cdot p_{X,Y}(x,y) + b \sum_{x \in \text{supp}(X)}\sum_{ y \in \text{supp}(Y)} y \cdot p_{X,Y}(x,y)\\ &= a\sum_{x \in \text{supp}(X)} x \left(\sum_{ y \in \text{supp}(Y)} p_{X,Y}(x,y)\right) + b \sum_{y \in \text{supp}(Y)} y \left(\sum_{ x \in \text{supp}(X)} p_{X,Y}(x,y)\right)\\ &= a\sum_{x \in \text{supp}(X)} x \cdot p_{X}(x) + b \sum_{y \in \text{supp}(Y)} y p_{Y}(y)\\ &= a{\mathbb E}(X) + b {\mathbb E}(Y) \end{aligned} \]

Transformations

  • Let \(X\) be a random variable and \(f\) be some function on the support of \(X\)
  • Let \(Y = f(X)\)
  • The transformation \(Y = f(X)\) is also a random variable!
  • The support, \(\text{supp}(f(X))\) is simply the everything \(f\) maps to from \(\text{supp}(x)\)

Expected value of \(Y = f(X)\)

\[ {\mathbb E}(f(X)) = \sum_{x \in \text{supp}(X)} {\mathbb P}(X=x) \times f(x) \]

Measure of uncertainty

Consider the following two gambling scenarios:

Coin Bet


You flip a fair coin.

  • If outcome is \(H\), I pay you \(\$10\)
  • If outcome is \(T\), you pay me \(\$10\)

\[\begin{aligned}{\mathbb E}(X) &= \Big(\frac{1}{2} \times \$10\Big) + \Big(\frac 12 \times \ -$10\Big)\\ \\ &= \$0\end{aligned}\]

Dice Bet


You roll a fair die:

  • If outcome is \(\left\{6\right\}\), I pay you \(\$60\)
  • Otherwise, you pay me \(\$12\)

\[\begin{aligned}{\mathbb E}(X) &= \Big(\frac{1}{6} \times \$60\Big) + \Big(\frac 56 \times \ -$12\Big)\\ \\ &= \$0\end{aligned}\]

Which of these two senarios do you prefer? Still the same?

Variance

Variance

The variance of a (quantitative) random variable \(X\) is a measure of the spread or dispersion of the random variable \(X\) around its expected value.

\[ {\text{Var}}(X) = \sum_{x \in \text{supp}(X)} {\mathbb P}(X=x) \times (x - {\mathbb E}(X))^2 \]

Equivalent formulation

\[ {\text{Var}}(X) = {\mathbb E}(X^2) - {\mathbb E}(X)^2 \]

Example revisited

Let \(X\) be the random variable representing your net gain.

Coin Bet


\(\text{supp}(X)\) $10 -$10
\({\mathbb P}(X=x)\) \(\frac 12\) \(\frac 12\)

\({\mathbb E}(X) = 0\)


\[\begin{aligned}{\text{Var}}(X) &= \left(\frac 12 \times (10 - 0)^2\right) + \left(\frac 12 \times (-10 - 0)^2\right)\\ \\ &= 100\end{aligned}\]

Dice Bet


\(\text{supp}(X)\) $60 -$12
\({\mathbb P}(X=x)\) \(\frac 16\) \(\frac 56\)

\({\mathbb E}(X) = 0\)


\[\begin{aligned}{\text{Var}}(X) &= \left(\frac 16 \times (60 - 0)^2\right) + \left(\frac 56 \times (-12 - 0)^2\right)\\ \\ &= 720\end{aligned}\]

Question

Which of the three distributions above has the largest variance?

Variance of a linear combination

Variance of a linear combination

For two independent random variables \(X\) and \(Y\) with \(X \perp\kern-5pt \perp Y\), and
for two constants \(a, b \in {\mathbb R}\), \[{\text{Var}}(aX + bY) = a^2 {\text{Var}}(X) + b^2 {\text{Var}}(Y).\]

Conditonal Expectation and Variance

Given a random variable \(X\) and an event \(A\)

  • The conditional expectation of \(X\) given the event \(A\) is \[\begin{aligned} {\mathbb E}(X|A) &= \sum_{x \in \text{supp}(X)} x \times {\mathbb P}(X=x | A)\\ &= \sum_{x \in \text{supp}(X)} x \times p_{X}(x|A) \end{aligned}\]

  • The conditional variance of \(X\) given the event \(A\) is \[\begin{aligned} {\text{Var}}(X|A) &= \sum_{x \in \text{supp}(X)} (x - {\mathbb E}(X|A))^2 \times {\mathbb P}(X=x | A)\\ &= \sum_{x \in \text{supp}(X)} (x - {\mathbb E}(X|A))^2 \times p_{X}(x|A) \end{aligned}\]