On the Difference between Normal and t distributions

Now that you are seeing confidence intervals made with different distributions, it can be hard to tell apart when to use each. The answer to this question requires us finding out whether or not we know the value for \(\sigma\) (population standard deviation), or whether we are estimating \(\sigma\) using the sample standard deviation s = \(\hat{\sigma}\).

If we know \(\sigma\), then we can just use the Normal distribution.
If we do not know \(\sigma\) (and must use s) then we will use the t-distribution.

Note: When working with differences in means it is almost always assumed that we do not know the weighted difference in population variances, so we will always use the t-distribution.

Question 1: Explain (in your own words) when I use the normal or t-distributions for CIs. Why was the t-distribution necessary, or equivalently what issue is it fixing?

Here’s my answer: We use the normal distribution for means only if we know what value \(\sigma\) (the population standard deviation) has. Otherwise we use the t-distribution for means. When we do not know \(\sigma\), we use the sample standard deviation in its place as an estimate. The act of “double-estimating” things in our confidence interval means we have extra uncertainty that needs accounted for in our CI, and this is what the t-distribution fixes by giving us slightly wider values.

Question 2: Explain (in your own words) the benefit of using a confidence interval as opposed to a point estimate from a sample.

Here’s my answer: Almost certainly the point estimate will not be entirely correct. A CI tries to create a range of ‘plausible’ values for the parameter, factoring in uncertainty that comes from what may change when using different samples from the population. If we just give the point estimate, we lose out on this information and it also sounds too precise to be true with no margin of error.

Mercury in Fish Pt. 2

Suppose the 48 fish were actually caught using random samples from 2 different locations: 1) gulf of Mexico and 2) east coast of Japan. We could look at answering a different research question:

Research Question: What is the difference in pop. mean mercury levels for yellowfin tuna in the Gulf of Mexico vs. off the east coast of Japan?

Gulf of Mexico sample

There were 20 fish caught in the gulf of Mexico with a sample mean of 0.413 and std. dev. of 0.15

East Coast of Japan sample

There were 28 fish caught off the east coast of Japan with a sample mean of 0.370 and std. dev. of 0.10

Question 3

Part A: Explain why the conditions to make a 95% CI for the difference in pop. means using a t-distribution are not met for this sample. (regardless, we will continue for practice)

The sample sizes of the groups are not larger than 30.

Part B: What is the df for the t-distribution we will use and how many SE’s will we add and subtract for our 95% CI?

df = 19
t = qt(.975, df=19)
t

## [1] 2.093024

Part C: What is the value of the SE?

SE = sqrt(.15^2 / 20 + .1^2 / 28)
SE

## [1] 0.03849861

Part D: Make a 95% CI for the difference in pop. means (keep track of subtraction order)

(.413 - .370) - t*SE

## [1] -0.03757851

(.413 - .370) + t*SE

## [1] 0.1235785

Part E: Interpret the confidence interval in context.

We are 95% confident the population mean mercury level of yellowfin tuna from the Gulf of Mexico is between 0.038ppm lower or 0.124ppm higher than yellowfin tuna from the East Coast of Japan.

Part F: According to the confidence interval, is it plausible there is actually no difference in pop. mean mercury levels?

Yes. Zero is within the interval, so our CI suggests there may actually be no difference or only a very smal difference.

Question 4 (Diamonds)

Note: I am choosing group 1 to be the 1 carat group for convenience.

(57.2 - 44.51) - qt(.975, df = 22)*sqrt(13.32^2/23 + 18.19^2/23)

## [1] 2.940605

(57.2 - 44.51) + qt(.975, df = 22)*sqrt(13.32^2/23 + 18.19^2/23)

## [1] 22.4394