Normal and t-distribution quantiles (Review)

We saw in the previous lab that the pnorm() function could be used to find probabilities associated with specific values for the Normal distribution.

pnorm(-1.96); pnorm(1.96, lower.tail=F)

## [1] 0.0249979

## [1] 0.0249979

i.e.: the probability below -1.96 and above 1.96 are both .025 on a standard normal distribution

The R functions qnorm() and qt()(we’ll talk about this one later) do the opposite: They give the cutoff that corresponds to a probability that we put in the function.

In order to use qnorm() we must give it

the probability
the mean (optional)
the standard deviation (optional)

Note: If you do not specify the mean and standard deviation qnorm() defaults to the standard normal distribution N(0,1).

qnorm(.025)

## [1] -1.959964

qnorm(.975)

## [1] 1.959964

These correspond to the cutoffs we use in a 95% confidence interval when we know \(\sigma\)’s value. These also correspond to the middle 95% of the standard normal distribution because .975-.025 = .95

When we want to use a different confidence % for our CI, we need to use values different from .025 and .975.

We will do this by using \(\alpha\) for a 100(1-\(\alpha\))% CI. The values will be \(\alpha / 2\) and \(1-(\alpha / 2)\).

For the 95% CI we have \(\alpha = .05\) and so that gave us the cutoffs \(\alpha / 2 = .05 / 2 = .025\) and \(1-(\alpha / 2)= 1-(.05 / 2) = .975\). We will actually get the same value just with a different sign (positive or negative) for each of these, so we will only care about the positive one from here on out.

The qt() function will work very similar. We input the cutoffs according the the confidence % with the \(\alpha\) values and it gives us the corresponding quantiles of the t-distribution instead of the normal distribution. We also need to include the degrees of freedom for the normal distribution.

# 95\% CI quantile cutoffs
# for a sample with sample size = 34 (n=34)
qt(.975, df=33)

## [1] 2.034515

This tells us we need to add and subtract 2.03 SE’s in our 95% CI for the mean using the t-distribution with df=33.

On the Difference between Normal and t distributions

Now that you are seeing confidence intervals made with different distributions, it can be hard to tell apart when to use each. The answer to this question requires us finding out whether or not we know the value for \(\sigma\) (population standard deviation), or whether we are estimating \(\sigma\) using the sample standard deviation s = \(\hat{\sigma}\).

If we know \(\sigma\), then we can just use the Normal distribution.
If we do not know \(\sigma\) (and must use s) then we will use the t-distribution.

Note: When working with differences in means it is almost always assumed that we do not know the weighted difference in population variances, so we will always use the t-distribution.

Question 1: Explain (in your own words) when I use the normal or t-distributions for CIs. Why was the t-distribution necessary, or equivalently what issue is it fixing?

Question 2: Explain (in your own words) the benefit of using a confidence interval as opposed to a point estimate from a sample.

Mercury in Fish Pt. 2

This is a continuation for the previous lab. Info is given below:

“Scientists recorded the mercury levels in 48 randomly selected yellowfin tuna caught in different locations throughout the fish’ natural range. For the purpose of this lab, assume this is a representative sample.”

New: Suppose the 48 fish were actually caught using random samples from 2 different locations: 1) gulf of Mexico and 2) east coast of Japan. We could look at answering a different research question, trying to find out how the two sites compare.

Research Question: What is the difference in pop. mean mercury levels for yellowfin tuna in the Gulf of Mexico vs. off the east coast of Japan?

Gulf of Mexico sample

There were 20 fish caught in the gulf of Mexico with a sample mean of 0.413 and std. dev. of 0.15

East Coast of Japan sample

There were 28 fish caught off the east coast of Japan with a sample mean of 0.370 and std. dev. of 0.10

Question 3

Part A: Describe the parameter for this particular research question in context.

Part B: Describe the corresponding statistic and its value according to our data.

Part C: Explain why the conditions to make a 95% CI for the difference in pop. means using a t-distribution are not met for this sample. (regardless, we will continue for practice)

Part D: What is the df for the t-distribution we will use and how many SE’s will we add and subtract for our 95% CI?

Part E: What is the value of the SE?

Part F: Make a 95% CI for the difference in pop. means (keep track of subtraction order)

Part G: Interpret the confidence interval in context.

Part H: According to the confidence interval, is it plausible there is actually no difference in pop. mean mercury levels?

Note: We will be done with fish measurements for awhile…