library(tidyverse)library("VennDiagram")a_color <-"indianred"b_color <-"dodgerblue"draw_ellipse <-function(center =c(0.5, 0.5), a =0.25, b =0.15, n =100, col ="white") { t <-seq(0, 2* pi, length.out = n) x <- center[1] + a *cos(t) y <- center[2] + b *sin(t)grid.polygon(x = x, y = y, gp =gpar(fill = col, col =NA))}library(grid)grid.newpage()grid.rect(gp =gpar(fill ="white", col =NA)) # full background = Acdraw_ellipse(center =c(0.5, 0.5), a =0.25, b =0.15, col = a_color) # elliptical Agrid.text("A", x =0.5, y =0.5, gp =gpar(fontsize =16, col ="white"))
Complement Rule
If \(A\) is an event, then the complement of \(A\), denoted \(A^c\), is the event that \(A\) does not occur.
Rule: \[
P(A^c) = 1 - P(A)
\]
Code
library(grid)# Define color# Draw complement Ac as full background with an elliptical A cut outgrid.newpage()grid.rect(gp =gpar(fill = a_color, col =NA)) # full background = Acdraw_ellipse(center =c(0.5, 0.5), a =0.25, b =0.15, col ="white") # elliptical A# Optional: labelgrid.text("Ac", x =0.1, y =0.9, gp =gpar(fontsize =16, col ="white"))
Intersection\(A \cap B\): event that both A and B occur.
Code
grid.newpage()invisible(draw.pairwise.venn(area1 =10,area2 =10,cross.area =5,category =c("A", "B"),fill =c("lightgrey", "lightgrey"),alpha =0.25, col ="darkgrey",print.mode ="none"))grid.circle(x =0.5, y =0.5, r =0.30, gp =gpar(fill ="indianred", col =NA, alpha =0.6))# Optional labelgrid.text("A ∩ B", x =0.5, y =0.5, gp =gpar(fontsize =20))
For mutually exclusive events: \[
P(A \cup B) = P(A) + P(B)
\]
In general: \[
P(A \cup B) = P(A) + P(B) - P(A \cap B)
\]
Conditional Probability
Probability of \(A\) given that \(B\) occurred: \[
P(A|B) = \frac{P(A \cap B)}{P(B)}
\]
Interpretation: how probable is \(A\) when we know \(B\) has occurred.
Independence
Events \(A\) and \(B\) are independent if: \[
P(A \cap B) = P(A) \cdot P(B)
\]
Intuition: knowledge of \(B\) tells us nothing about \(A\), and vice versa.
note: two independent events ARE NOT mutually exclusive.
Total Probability Theorem
If \(B_1, B_2, \ldots, B_n\) form a partition of the sample space: \[
P(A) = \sum_{i=1}^3 P(A\cap B_{i}) = \sum_{i=1}^3 P(A|B_i) P(B_i)
\] recall that \(P(A|B_i) = \frac{P(A \cap B_i)}{P(B_i)}\) and then \(P(A \cap B_i)=P(A| B_{i})P(B_{i})\)
Code
library(grid)grid.newpage()# Draw three adjacent rectangles: B1, B2, B3grid.rect(x =1/6, width =1/3, height =1, just ="center", gp =gpar(fill ="lightblue", col ="darkgrey"))grid.rect(x =0.5, width =1/3, height =1, just ="center", gp =gpar(fill ="lightgreen", col ="darkgrey"))grid.rect(x =5/6, width =1/3, height =1, just ="center", gp =gpar(fill ="lightpink", col ="darkgrey"))# Label each B_igrid.text("B1", x =1/6, y =0.95, gp =gpar(fontsize =14))grid.text("B2", x =0.5, y =0.95, gp =gpar(fontsize =14))grid.text("B3", x =5/6, y =0.95, gp =gpar(fontsize =14))# Draw event A as a horizontal ellipse overlapping all three rectanglesdraw_ellipse <-function(center, a, b, col ="orchid", alpha =0.4) { t <-seq(0, 2* pi, length.out =200) x <- center[1] + a *cos(t) y <- center[2] + b *sin(t)grid.polygon(x = x, y = y, gp =gpar(fill = col, col =NA, alpha = alpha))}# A overlaps portions of B1, B2, B3draw_ellipse(center =c(0.5, 0.5), a =0.5, b =0.15)# Label Agrid.text("A", x =0.5, y =0.7, gp =gpar(fontsize =16, col ="black"))grid.text("(A ∩ B1)", x =0.2, y =0.5, gp =gpar(fontsize =16, col ="black"))grid.text("(A ∩ B2)", x =0.5, y =0.5, gp =gpar(fontsize =16, col ="black"))grid.text("(A ∩ B3)", x =0.8, y =0.5, gp =gpar(fontsize =16, col ="black"))
Even after testing positive, there’s only a 16.7% chance that you actually have the disease, due to the low base rate of the disease in the population.
intermission
The Fallibility of the Judge
The Robbery
In 1964, an elderly lady, returning from the supermarket, is pushed to the ground and robbed of her purse.
She manages to notice that the assailant is a blonde woman with a ponytail
A passerby, who witnessed the scene, notices that the robber escapes in a yellow car driven by a man with a beard, mustache, and dark skin
The Suspects
A few days later, Janet and Malcolm Collins are stopped—they match the descriptions.
During the trial, a mathematician calculates the probability that an innocent couple would match each of the following characteristics:
\(P(A \mid B)\): Given that the animal has 4 legs, what’s the probability it’s a dog?
\(P(B \mid A)\): Given that it’s a dog, what’s the probability it has 4 legs?
of course they are not the same!
Back to the Trial
Define:
\(A\): The couple is innocent
\(B\): The couple matches the witnesses’ description
The mathematician calculated:
\(P(B \mid A)\): Probability a couple matches the description if they are innocent
What should have been calculated:
\(P(A \mid B)\): Probability a couple is innocent given they match the description
Facts:
There were 10 couples in the city matching the description
Of those, 9 were innocent
Therefore:
\(P(B \mid A) = \frac{1}{12,000,000}\)
\(P(A \mid B) = \frac{9}{10}\)
True News
Dropout Rates
The dropout rate increased by 100%
The dropout rate increased from 0.001 to 0.002
Both statements are true, but the one on the left has a greater emotional impact.
end of intermission
Random Variables
Often we’re not directly interested in the outcome of a particular experiment, but in the numerical value that the result determines (e.g., when playing dice, what matters is the sum of the dice faces).
The quantities determined based on the outcome of an experiment are called random variables.
Each value of a random variable corresponds to one or more outcomes of an experiment, so each value of a random variable is associated with a probability.
In general, we don’t know what value the random variable will take, but by studying the probability distribution we can understand what to expect.
Probability and Random Variables
Consider the experiment tossing four coins and define the random variable \(X =\) number of heads (H).
The values that \(X\) can assume are \(\{0, 1, 2, 3, 4\}\).
Probability Distribution of a X (also called probability mass function, pmf)
Now that we’ve calculated the probabilities associated with each value of \(X\), we can summarize them in a table:
\(x_i\)
\(p_i\)
0
1/16
1
4/16
2
6/16
3
4/16
4
1/16
(they look like the relative frequencies associated to the values of \(X\), not?)
Plot of the pmf
Code
df <-data.frame(x_i =c(0, 1, 2, 3, 4),p_i =c(1/16, 4/16, 6/16, 4/16, 1/16))# Create the lollipop plotggplot(df, aes(x =factor(x_i), y = p_i)) +geom_segment(aes(xend =factor(x_i), y =0, yend = p_i), color ="dodgerblue", size =1.5) +geom_point(color ="indianred", size =4) +labs(title ="Probability Distribution of Number of Heads (X)",x ="Number of Heads (xᵢ)",y ="Probability (pᵢ)" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =16),axis.text =element_text(size =12),axis.title =element_text(size =14) )
Event Probabilities from Distribution
Knowing the probability distribution of \(X\), we can compute probabilities of events derived from the sample space \(\Omega\).
Event A: at least 3 heads:
\[
P(A) = P(X\geq 3)= P(X=3) + P(X=4) = \frac{4}{16} + \frac{1}{16} = \frac{5}{16}
\] you add up the probabilities associated with the values of \(X\) that satisfy the event.
For continuous random variables, the probability to observe exactly some value (the bus takes exaclty 63.4 seconds…) is zero
one rather refers to a small interval of values, and the probability density function is used.
The probability of observing a value in the interval \([x_1, x_2]\) is given by the area under the curve of the PDF between \(x_1\) and \(x_2\):
\[
\int_{x_{1}}^{x_{2}} f(x)dx
\] where \(f(x)\) is the probability density function (PDF) of the random variable \(X\).
Probability density function (PDF):
consider a generic distribution \(f(x)\), that takes values in between -6 and 6.
what is the probability to observe a value in between -0.5 and 1?
Code
# Define the bimodal density functionbimodal_density <-function(x) {0.3*dnorm(x, mean =-1, sd =1) +# First peak0.5*dnorm(x, mean =2, sd =0.8) # Second peak}# Generate data for plottingx_vals <-seq(-6, 6, length.out =1000)density_vals <-bimodal_density(x_vals)df <-tibble(x = x_vals, y = density_vals)highlight_df <- df |>filter(x>-.5& x <=1)# Plotggplot(df, aes(x = x, y = y)) +geom_line(color ="steelblue", size =1.2) +geom_area(data = highlight_df, fill ="indianred", alpha =0.5) +labs(title ="generic pdf: probability between -0.5 and 1",x ="x", y ="Density" ) +theme_minimal()
taking the integral is just like summing up the probabilities of all possible outcomes in the PMF, but now we are integrating over a continuous range of values instead of summing over discrete values.
A super easy probability density function
Suppose that a random variable maps whether an NBA basketball player is on the pitch or not.
The game consists of 4 quarters, each of 12 minutes.
based on last season, here’s the probability distribution for the player to be in play during the game
for discrete variables, the expected value is computed as the mean of the variable, with the weights being the probabilities of each value, instead of their relative frequencies
E[X] for continuous variables
for continuous variables, the rationale is the same, but integration is used instead of summation