When do we use descriptive statistics?
To literally describe the data we have. Talking about means/medians/std.dev.s gives us an idea of how the data looks.
What descriptive statistics do we use for categorical variables?
Proportions
What descriptive statistics do we use for quantitative variables?
Means, Std. devs., medians, IQRS
When do we use correlation?
When we want to see if there is a relationship between 2 quant. variables. Pearson’s is specifically for linear, Spearman’s can work for non-linear relationships
When do we use regression?
When we want to predict values for one quant var. using another and there is a linear relationship
When do we use probability?
When we are trying to quantify how likely/unlikely an event or outcome is
What types of likelihoods did we learn about?
probabilities, odds, odds ratios, relative risk, conditional probabilities
When do we use confidence intervals?
When we are trying to estimate a parameter
What types of parameters did we estimate?
Means, difference in means, proportions, difference in proportions. We can use Bootstrap for other parameters
When do we use hypothesis tests?
When we are answering a Yes/No question about a parameter
What types of parameters did we answer questions about?
Means, difference in means, proportions, difference in proportions, slopes (regression ANOVA)
What 3 types of tests did we learn to compute p-values?
Z-tests, t-tests, \(\chi^2\)-tests, F-tests
For each of the following, decide which method of analysis fits best. Also describe what type of parameter we are working with (mean, multiple means, proportion, multiple proportions). Most of these are courtesy of Dr. Ziegler at ISU.
Anthropologists have found two burial mounds in the same region. They know that several different tribes lived in the region and that the tribes have been classified according to different lengths of skulls. They measure the skulls found in each burial mound and wish to determine if the two mounds were made by different tribes for the population of all skulls at these two burial mounds.
# Z-test for difference in proportions
Researchers were commissioned by the Violence In Children’s Television Investigative Monitors (VICTIM) to study the frequency of depictions of violent acts in Saturday morning TV fare. They selected a random sample of 40 shows which aired during this time period over a 12-week period. Suppose that 28 of the 40 shows in the same were judged to contain scenes depicting overtly violent acts. Do more than half of all Saturday morning TV shows depict overtly violent acts.
# Z-test for a single proportion
The Career Planning Office is interested in seniors’ plans and how they might relate to their majors. A large number of students are surveyed and classified according to their major (Natural Science, Social Science, Humanities) and future plans (Graduate School, Job, Undecided). Are the type of major and future plans significantly related?
# chi2 test for independence
(Kind of a weird one, I’ll admit, but I saw a strange study that tried to answer this) How many times a day do all humans urinate, on average?
# confidence interval for a mean
In one of his adventures, Sherlock Holmes found footprints made by the criminal at the scene of a crime and measured the distance between them. After sampling many people, measuring their height and length of stride, he confidently announced that he could predict the height of the suspect. How?
# regression equation, predicting height
A researcher plans to randomly sample 10 pigs to test if they have Disease A. The probability of Disease A is 0.02. How likely is it that 1 of the 10 pigs will have Disease A?
# probability / odds
What percentage of all Americans support same-sex marriage?
# confidence interval for a proportion
Is the percentage of the national budget spent on health care associated with life expectancy for countries?
# correlation (both quantitative)
People were recruited for a study on weight loss, in which participants were randomly assigned to one of two groups. Group 1 was given exercise instructions and group 2 was given no exercise instructions. The researchers are interested in estimating how much more weight was lost on average by people who were given exercise instructions, as opposed to those who weren’t.
# confidence interval for a difference in means
A professional figure skater was interested in which of two jumps she landed more consistently. She did 50 double loops (landed 47 successfully) and 50 double flips (landed 48 successfully), and wanted to determine whether this was enough evidence to conclude that she has a higher success rate with one jump than another.
# Z-test for a difference in proportions
Identify which value represents the sample mean and which value represents the claimed population mean.
Sample mean = $58. Pop. mean = $52
Sample mean = 3.59. Pop. mean = 3.37
Researchers studying the relationship between honesty, age and self-control conducted an experiment on 160 children between the ages of 5 and 15. The researchers asked each child to toss a fair coin in private and to record the outcome (white or black) on a paper sheet, and said they would only reward children who report white. Half the students were explicitly told not to cheat and the others were not given any explicit instructions. Differences were observed in the cheating rates in the instruction and no instruction groups, as well as some differences across children’s characteristics within each group. (Bucciol and Piovesan 2011)
Pop = all children aged 5 to 15
Sample = 160 children between ages 5 and 15
No generalizations, as this was not a random sample. Yes to causal claims, as this did have randomization (we are told it was an ‘experiment’).
Researchers investigating the effects of gamification (application of game-design elements and game principles in non-game contexts) on learning statistics randomly assigned 365 college students in a statistics course to one of four groups; one of these groups had no reading exercises and no gamification, one group had reading but no gamification, one group had gamification but no reading, and a final group had gamification and reading. Students in all groups also attended lectures. The study found that gamification had a positive impact on student learning compared to traditional teaching methods involving reading exercises. (Legaki et al. 2020)
Pop = all college students
Sample = 365 college students in one statistics course
No generalization, we don’t have a random sample and it certainly won’t be representative. Yes to causal claims, as there was randomization.
In a public health study on the effects of consumption of fruits and vegetables on psychological well-being in young adults, participants were randomly assigned to three groups: (1) diet-as-usual, (2) an ecological momentary intervention involving text message reminders to increase their fruits and vegetable consumption plus a voucher to purchase them, or (3) a fruit and vegetable intervention in which participants were given two additional daily servings of fresh fruits and vegetables to consume on top of their normal diet. Participants were asked to take a nightly survey on their smartphones. Participants were student volunteers at the University of Otago, New Zealand. At the end of the 14-day study, only participants in the third group showed improvements to their psychological well-being across the 14-days relative to the other groups. (Conner et al. 2017)
Experiment. There was randomization.
Response = psychological well being score, explanatory = fruit consumption
No. Students were volunteers, so this is not a random sample.
Yes. Students were randomly assigned to the treatment groups.
“The results of this study provide evidence (not PROOF) that giving NZ college students fresh fruit and vegetables improves psychological well-being.”