A CDC report on sleep deprivation rates shows that the proportion of California residents who reported insufficient rest or sleep during each of the preceding 30 days is 8.0%, while this proportion is 8.8% for Oregon residents. These data are based on simple random samples of 11,545 California and 4,691 Oregon residents.
Goal: Conduct a hypothesis test to determine if these data provide strong evidence that the rate of sleep deprivation is different for the two states.
Part A: What type of hypothesis test is this? (proportion / diff. in proportions/ mean / diff. in means)
# diff in proportions
Part B: Compute \(\widehat{p}_{pool}\).
p_pool = (.08*11545 + .088*4691)/(11545 + 4691)
Part C: Compute the test-statistic Z.
Z = (.088 - .08) / sqrt(p_pool*(1-p_pool)*(1/11545 + 1/4691))
Z
## [1] 1.681135
sqrt(.0823 * (1-.0823)*(1/11545 + 1/4691))
## [1] 0.004758391
Part D: What is the p-value?
2*pnorm(1.68, lower.tail=F)
## [1] 0.09295732
Part E: Write a conclusion to this question using ‘strength of evidence’.
There is weak evidence that the two states have different rates of sleep deprivation. Looking at our sample statistics, it looks like Oregon has the higher rate.
A random sample is selected from an approximately normal population
with an unknown standard deviation. Get the p-value using the
pt() function in R. You may need to use ?pt()
in your console to see required inputs. It otherwise functions similarly
to pnorm() and qt() used previously.
Part A: n = 26, T = 2.485, two-tail test
2*pt(2.485, df=25, lower.tail=F)
## [1] 0.0200048
Part B: n = 18, T = 0.5, right-tail test
pt(.5, df=17, lower.tail=F)
## [1] 0.3117426
Georgianna claims that in a small city renowned for its music school, the average child takes less than 5 years of piano lessons. We have a random sample of 35 children from the city, with a sample mean of 4.6 years of piano lessons and a sample standard deviation of 2.2 years.
Evaluate Georgianna’s claim using a hypothesis test.
Solutions:
\(H_0\): \(\mu\)=5, \(H_A\): \(\mu <\) 5
There was a random sample of 35 \(>\) 30 children, so the conditions are met to use a t-test.
# test-statistic
T = (4.6 - 5)/(2.2 / sqrt(35))
T
## [1] -1.075651
# p-value
pt(T, df=34)
## [1] 0.1448287
There is very little to no evidence to say that the mean years of piano lessons taken by children is below 5 years.
We have data on two random samples of diamonds: one with diamonds that weigh 0.99 carats and one with diamonds that weigh 1 carat. Each sample has 23 diamonds. Sample statistics for the price per carat of diamonds in each sample are provided below.
Sample mean = $44.51, s.d. = $13.32, n=23
Sample mean = $57.20, s.d. = $18.19, n=23
Assuming that the conditions for conducting inference using the t-distribution are satisfied, perform a hypothesis test to see if there is a difference in population prices per carat of diamonds that weigh 0.99 carats and 1 carat. (Wickham 2016)
Solutions:
# test-statistic
T = (57.2 - 44.51) / sqrt(13.32^2 / 23 + 18.19^2 / 23)
T
## [1] 2.699393
# p-value
pt(T, df=22, lower.tail=F)
## [1] 0.006548213
With a test-statistic of 2.699 and p-value of .007, there is overwhelming evidence the population mean prices of diamonds are different for those that weigh 0.99 carats and those that weigh 1 carat.
Part A: Explain what a null distribution is and why we are using it.
Answer: It is a distribution that represents statistics from hypothetical samples where the null distribution is actually true. It gives us a point of comparison for the statistic we get from our study: we can see if it matches up with the distribution. The following in B-D are examples of Null distribution.
For the following, explain what the following symbols mean. Not just the names, but the deeper meaning. Start your explanation with “For repeated samples…”.
Part B: \(\widehat{p} \sim N(p_0, \sqrt{p_0(1-p_0)/n})\)
Answer: For repeated samples of size n and a true population proportion of \(p_0\), plotting the sample proportions will create a Normal distribution with a mean of \(p_0\) and standard deviation of \(\sqrt{p_0(1-p_0)/n})\).
Part C: \(\frac{\bar{x}-\mu_0}{s/\sqrt{n}} \sim t_{n-1}\)
Answer: For repeated samples of size n and a true population mean of \(\mu_0\), plotting the test-statistics (\(\frac{\bar{x}-\mu_0}{s/\sqrt{n}}\)) from these samples will create a t-distribution with \(n-1\) degrees of freedom.
Part D: \(\frac{(\bar{x}_1 - \bar{x}_2)}{\sqrt{\frac{s_1^2}{n1}+\frac{s_2^2}{n_2}}} \sim t_{min(n_1, n_2)-1}\)
For repeated samples of size \(n_1\) and \(n_2\) from populations with equal means, plotting the test statistics (a function of sample mean differences and standard error) will create a t-distribution with \(n_1 -1\) or \(n_2 -1\) df (whichever is smaller).
Part A: Why do we calculate a test-statistic?
Answer: The associated null distribution is much easier to work with.
Part B: Why do we sometimes call tests left, right, or two-tailed tests?
Answer: It corresponds to which side of the distribution we use to compute a p-value.
Part C: Now that you have seen a few examples of p-values, in non-statistical jargon: What do they tell us about our data/sample?
Answer: They tell us whether the data matches up with the null hypothesis. They do not tell us how likely or unlikely the null hypothesis is.