Due: Wednesday 5/06 at 10pm.

For the following questions you are allowed to perform hypothesis tests by hand, using R, or using the hypothesis test functions covered in the lab for 4/27 (if applicable).

This assignment will not be graded on accuracy, only meaningful completion. The purpose of this assignment is to prepare you for the final exam.

Question 1

Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. (Ritz et al. 2000)

  1. Identify the population of interest and the sample in this study.
  2. Comment on whether the results of the study can be generalized to the population, and if the findings of the study can be used to establish causal relationships.

Question 2

An excerpt from an article titled The School Bully Is Sleepy is below. A friend of yours who read the article says, “The study shows that sleep disorders lead to bullying in school children.” Is this statement justified? If not, how best can you describe the conclusion that can be drawn from this study? (Parker-Pope 2011)

“The University of Michigan study, collected survey data from parents on each child’s sleep habits and asked both parents and teachers to assess behavioral concerns. About a third of the students studied were identified by parents or teachers as having problems with disruptive behavior or bullying. The researchers found that children who had behavioral issues and those who were identified as bullies were twice as likely to have shown symptoms of sleep disorders.”


Question 3

Read pages 216-218 in the IMS textbook then answer the following:

Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score for Verbal Reasoning section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal. Use the information to compute each of the following.

  1. Write down the short-hand for each of the two normal distributions.
  2. What is Sophia’s Z score on the Verbal Reasoning section? On the Quantitative Reasoning section? Draw a standard normal distribution curve and mark the two Z scores.
  3. What do the Z scores tell you?
  4. Relative to others, which section did Sophia do better on?
  5. Find her percentile scores for each of the two exams.
  6. What percent of the test takers did better than her on the Verbal Reasoning section? On the Quantitative Reasoning section?
  7. Explain why simply comparing raw scores from the two sections could lead to an incorrect conclusion as to which section a student did better on.

Question 4

A study suggests that the 25% of 25 year-olds have gotten married. You believe that this is incorrect and decide to collect your own sample for a hypothesis test. From a random sample of 776 25 year-olds, you find that 24% of them are married. A friend of yours (who previously took 209 but maybe got a dubious grade) offers to help you with setting up the hypothesis test and comes up with the following hypotheses. Indicate any errors you see.

Question 5

A Marist Poll report states that 66% of American adults think licensed drivers should be required to retake their road test once they reach 65 years of age, based on a random sample of 1,018 American adults. They also report a margin of error was 3% at the 95% confidence level. (Poll 2011)

  1. Verify the margin of error reported by The Marist Poll using a mathematical model.
  2. Based on a 95% confidence interval, does the poll provide convincing evidence that more than two thirds of the population think that licensed drivers should be required to retake their road test once they turn 65?
  3. Verify your answer to part (b) by performing an appropriate hypothesis test.

Question 6

Nearsightedness (myopia) is a common vision condition in which you can see near objects clearly, but farther away objects blurry. It is believed that nearsightedness affects about 8% of all children. In a random sample of 194 children, 21 are nearsighted. Using a mathematical model, conduct a hypothesis test for the following question: do these data provide evidence that the 8% value is inaccurate?

Question 7

For each of the following statements, indicate if they are a true or false interpretation of the p-value. If false, provide a reason or correction to the misinterpretation. You are wondering if the average amount of cereal in a 10oz cereal box is greater than 10oz. You collect 50 boxes of cereal, weigh them carefully, find a T score, and a p-value of 0.23.

  1. The probability that the average weight of all cereal boxes is 10 oz is 0.23.
  2. The probability that the average weight of all cereal boxes is greater than 10 oz is 0.23.
  3. Because the p-value is 0.23, the average weight of all cereal boxes is 10 oz.
  4. Because the p-value is small, the population average must be just barely above 10 oz.
  5. If \(H_0\) is true, the probability of observing another sample with an average as or more extreme as the data is 0.23

Question 8

Gaming, distracted eating, and intake. A group of researchers who are interested in the possible effects of distracting stimuli during eating, such as an increase or decrease in the amount of food consumption, monitored food intake for a group of 44 patients who were randomized into two equal groups. The treatment group ate lunch while playing solitaire, and the control group ate lunch without any added distractions. Patients in the treatment group ate 52.1 grams of biscuits, with a standard deviation of 45.1 grams, and patients in the control group ate 27.1 grams of biscuits, with a standard deviation of 26.4 grams. Do these data provide convincing evidence that the average food intake (measured in amount of biscuits consumed) is different for the patients in the treatment group compared to the control group? Assume that conditions for conducting inference using mathematical models are satisfied. (Oldham-Cooper et al. 2011)