Disjoint Events are events that can’t happen at the same time.
Disjoint events A and B satisfy P(A and B) = 0
Complement Rule: P(not A) = 1 - P(A)
Additive Rule: P(A or B) = P(A) + P(B) - P(A and B)
Special: A and B disjoint \(\rightarrow\) P(A or B) = P(A) + P(B)
Multiplicative Rule: P(A and B) = P(A if B)\(\times\)P(B) = P(B if A)\(\times\)P(A)
Special: A and B are independent \(\rightarrow\) P(A and B) = P(A)\(\times\)P(B)
Conditional Probabilities: P(A if B) = \(\frac{\text{P(A and B)}}{\text{P(B)}}\)
Independent Events: A and B are independent if P(A if B) = P(A)
For this question we will be return to using the Titanic dataset built into R, providing information on the fate of the passengers on the fatal maiden voyage of the ocean liner Titanic summarized according to economic status (class), sex, age, and survival. As you will hopefully see in the next set of questions, having access to a table of data can make answering probability questions a lot easier.
library(ggplot2)
library(dplyr)
theme_set(theme_bw())
## Data for lab
data(Titanic)
titanic <- as.data.frame(Titanic)
titanic <- titanic[rep(1:nrow(titanic), times = titanic$Freq), ]
titanic$Freq <- NULL
A review of functions we’ve seen before:
table()
function which returns a count of
categorical dataproportions()
function which takes a table as an
argument and returns a (optionally conditional) table of
proportionsaddmargins()
function will sum the margins of a
tableUse the following table of Titanic data to answer the questions
below. You can change the table to proportions using the
proportions()
function if it helps you (but be careful with
overall/row/column proportions). Practice using the probability short
notation as your write your answers. For example: P(Survived) in
Part A.
with(titanic, table(Survived, Class)) %>% addmargins()
## Class
## Survived 1st 2nd 3rd Crew Sum
## No 122 167 528 673 1490
## Yes 203 118 178 212 711
## Sum 325 285 706 885 2201
Part A: For a randomly selected passenger, what is the probability they survived?
Part B: For a randomly selected passenger, what is the probability they were in 1st class?
Part C: For a randomly selected passenger, what is the probability they were in 1st class and survived?
Part D: Given that a passenger was in 1st class, what is the probability they survived?
Part E: Given that a passenger survived, what is the probability they were in 1st class?
Part F: Are the events “Survived” and “1st class” disjoint? Explain.
Part G: Are the events “Survived” and “1st class” independent? Justify your answer by comparing appropriate probabilities from Parts A through E. (There are many such comparisons)
We can accomplish a lot with only a few probabilities given to us if we use the probability formulas, it’s just more difficult and time consuming than using a table. For this problem we are going to start with data on heart attacks and medications similar to a study from the slides, but with different values.
Use the following information to answer the question parts below:
Part A: What is P(heart attack)?
Part B: What is P(heart attack if taking aspirin)?
Part C: What is P(heart attack and taking placebo)?
Part D: What is P(heart attack or aspirin)?
Part E: According to the data, is it more likely someone has a heart attack while taking aspirin or taking no medication (placebo)?
Part F: Are the events “Heart Attack” and “Taking aspirin” independent according to our data?
Part G: Suppose your friend uses the probabilities P(heart attack and aspirin) = .01 and P(heart attack and placebo) = .01 and concludes that the aspirin doesn’t have any effect on the probability of having a heart attack. Point out the flaw in their reasoning.
This is the table of data for heart attacks.
Treatment | Heart_Attack | No_Heart_Attack |
---|---|---|
Aspirin | 24 | 1533 |
Placebo | 25 | 851 |
Total | 49 | 2384 |
Part A: What are the odds for having a heart attack in the aspirin group? Simplify the result so it looks like 1:X.
Part B: What are the overall odds for having a heart attack? Simplify the result so it looks like 1:X.
Part C: What is the odds ratio for heart attacks for the aspirin group and overall?
Part D: According to the odds ratio, are aspirin use and heart attack occurence associated?
Part E: Explain to someone who has never taken a statistics class what the odds ratio value we got tells us about aspirin’s effect on heart attack rates.
The following information comes from [this study].
Time-restricted eating, a type of intermittent fasting, involves limiting the hours for eating to a specific number of hours each day, which may range from a 4- to 12-hour time window in 24 hours. Many people who follow a time-restricted eating diet follow a 16:8 eating schedule, where they eat all their foods in an 8-hour window and fast for the remaining 16 hours each day, the researchers noted. Previous research has found that time-restricted eating improves several cardiometabolic health measures, such as blood pressure, blood glucose and cholesterol levels.
In this study, researchers investigated the potential long-term health impact of following an 8-hour time-restricted eating plan. They reviewed information about dietary patterns for participants in the annual 2003-2018 National Health and Nutrition Examination Surveys (NHANES) in comparison to data about people who died in the U.S., from 2003 through December 2019.
The study included approximately 20,000 adults in the U.S. with an average age of 49 years. They found that people who followed a pattern of eating all of their food across less than 8 hours per day had a 91% higher risk of death due to cardiovascular disease.
Part A: The study claims that those who follow the 8-hour time restricted diet had a 91% increased risk of dying from cardiovascular disease. Explain what this means in terms of comparing probability of dying from cardiovascular disease for those who follow the 8-hour diet and those who do not.
Part B: Just knowing about this increased risk is not enough for us to tell how likely someone is to die from cardiovascular disease if they follow this 8-hour diet. What else do we need to know to figure this out?
Part C: As of 2022, there were 702,880 US adults who died of cardiovascular disease. In 2022, there were a total of 258.3million US adults. Use these values to estimate the probability of a US adult dying from cardiovascular disease each year.
Part D: Using the results from the study, that those using the 8-hour time restricted diet have a 91% increased risk of cardiovascular disease and your answer to Part C, what is the probability of dying from cardiovascular disease for a US adult using the 8-hour diet? Is this probability large or small (subjective)?