library(ggplot2)
library(dplyr)
theme_set(theme_bw())
## Better histograms
gh <- function(bins = 10) {
geom_histogram(color = 'black', fill = 'gray80', bins = bins)
}
This lab is oriented around the use of the aov()
function in R to fit an analysis of variance model.
The first ANOVA we will consider is that which compares equality of
means across several groups. To begin, we’ll consider the
dog
dataset that was introduced in class.
dogs <- read.csv("https://collinn.github.io/data/dogs.csv")
From this dataset, we find that we have one quantitative variable (speed), along with three categorical variables for breed, size, and color. We can quickly get a summary of these categories with the table function
with(dogs, table(color))
## color
## black brown white yellow
## 200 50 50 100
Here, we see that the color variable has 4 categories with 400
observations. To conduct an ANOVA, we will use the same formula syntax
that was used for the two sample t-test, with our quantitative variable
on the left hand side and the categories on the right. We’ll do this
using aov()
aov(speed ~ color, dogs)
## Call:
## aov(formula = speed ~ color, data = dogs)
##
## Terms:
## color Residuals
## Sum of Squares 1651.64 50746.34
## Deg. of Freedom 3 396
##
## Residual standard error: 11.32022
## Estimated effects may be unbalanced
The basic output of this function only gives us limited information
related to the sums of squares and degrees of freedom. We can recreate
the tables we saw in lecture by passing the output of aov()
into a summary()
function, either with a pipe
(%>%
) or by assigning the output to a new variable and
calling summary on it
## Method 1
# model <- aov(speed ~ color, dogs)
# summary(model)
## Method 2
aov(speed ~ color, dogs) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## color 3 1652 550.5 4.296 0.00533 **
## Residuals 396 50746 128.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, we now get the mean sums of squares for our groups and residuals, along with an F statistic and an associated p-value. In particular, we know that this statistic follows an F distribution with \(F(3, 396)\) degrees of freedom. The \(p\)-value would be the area under the curve to the right of the red line below:
Question 1: Using the dogs
dataset,
recreate a boxplot showing the speeds of each dog, separated by color.
Based on this plot, does it appear as if there is a statistically
significant (\(\alpha = 0.05\))
difference in group means? Compare your conclusion with the ANOVA output
done above. Explain what might be causing you to reject the null
hypothesis in this case.
ggplot(dogs, aes(x=speed, y=color)) + geom_boxplot()
No, not really by visuals. The Yellow and White groups look like they may have different centers based on the sample data.
Question 2: The data for this question involves sandwiches and ants. An experiment was conducted in which sandwiches were made with various combinations of filling and bread, both with and without butter added. A piece of the sandwich was ripped off and left near an ant hill for several minutes, at which point a jar was placed over the sandwich and the number of ants surrounding it were counted.
sandwich <- read.csv("https://collinn.github.io/data/sandwich.csv")
Suppose that you wanted to choose one variable (Butter or Filling) to predict how many ants will be attracted to a sandwich. Which do you think would best accomplish this goal? Justify your answer with a plot and a statistical test.
ggplot(sandwich, aes(x=Ants, y=Butter))+geom_boxplot()
ggplot(sandwich, aes(x=Ants, y=Filling))+geom_boxplot()
aov(Ants~Butter, sandwich) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## Butter 1 2055 2055.1 7.872 0.0065 **
## Residuals 70 18275 261.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov(Ants~Filling, sandwich) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## Filling 2 7645 3823 20.79 8.56e-08 ***
## Residuals 69 12685 184
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value for using Filling as a predictor is much lower. We can visually see a difference between means as judged by boxplots of the samples. Ham and Pickles has way more ants compared to the other groups.
As noted previously, ANOVA is specifically concerned with testing the null hypothesis of equality between means for multiple groups,
\[ H_0: \mu_1 = \mu_2 = \dots = \mu_k \] Should we perform an ANOVA and reject our null hypothesis, we only know that at least two of our group means are different. Post-hoc pairwise testing (Latin for “after this” or “after the fact”) can be done to determine which of our pairwise differences are likely responsible.
Consider again our dog dataset in which we wish to test for equality
in average speed between different colored dogs. This is done simply
with the aov()
function
## This will assign the results to a variable called model
model <- aov(speed ~ color, dogs)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## color 3 1652 550.5 4.296 0.00533 **
## Residuals 396 50746 128.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, we see information on the squared error from the grouping and our residuals, along with an F-statistic and a p-value. If we were testing at the \(\alpha = 0.05\) level, we would reject this test as \(p-val = 0.0053\).
To determine which pairwise colors had a difference, we can use the
TukeyHSD()
function (Tukey honest statistical difference)
on the model
object we created above:
## Pass in output from aov() function
comp <- TukeyHSD(model)
comp
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = speed ~ color, data = dogs)
##
## $color
## diff lwr upr p adj
## brown-black 1.961210 -2.6566404 6.5790606 0.6923676
## white-black 4.236027 -0.3818238 8.8538772 0.0852883
## yellow-black -2.396760 -5.9737312 1.1802120 0.3101186
## white-brown 2.274817 -3.5663536 8.1159868 0.7467192
## yellow-brown -4.357970 -9.4165715 0.7006321 0.1188852
## yellow-white -6.632786 -11.6913881 -1.5741845 0.0043711
There are a few things to note here:
TukeyHSD()
function by passing in an
argument for conf.level
From this output, we see the only statistically significant difference in between yellow and white.
Finally, we can plot the output from the TukeyHSD()
function with a call to the base R function plot()
## Pass in output from TukeyHSD() function
plot(comp)
Note here again, the only confidence interval that does not contain 0 (our null hypothesis for pairwise tests) is that between yellow and white, consistent with the output we observed above.
Question 3: An individual’s “critical clicker frequency” is the highest frequency at which a flickering light source can actually be detected to be flickering; at frequencies above this rate, the light source will appear continuous. EPILEPSY WARNING (An example of this can be seen here)
The data below come from a study titled “The effect of iris color on critical flicker frequency” published in the Journal of General Psychology, recording the critical clicker frequency and iris color for \(n = 19\) subjects
flicker <- read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/flicker/flicker.txt')
Null: there is no difference between mean frequencies for the colors Alternative: at least one color has a different mean frequency than the others
aov()
function to
analyze this data and print a summary using summary()
model2 = aov(Flicker ~ Color, flicker)
model2 %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## Color 2 23.00 11.499 4.802 0.0232 *
## Residuals 16 38.31 2.394
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residuals has a larger sum of squares. Color has a larger mean square. This is because the groups have different df, Color has many less to divide by.
The following code simulations a few situations to help us better understand what some components of the ANOVA table represent. We are going to also see what changing group variances does to the F-statistic and p-value for the ANOVA test.
Here we are simulating 50 observations from each of 3 groups. All have the same population std. dev. \(\sigma=40\). The corresponding population means are \(\mu_1 = 250\), \(\mu_1 = 245\), and \(\mu_1 = 255\). The group means are not equal, but the variances are relatively large which we can see by the following plot.
set.seed(423)
group1 = rnorm(50, mean=250, sd=40)
group2 = rnorm(50, mean=245, sd=40)
group3 = rnorm(50, mean=255, sd=40)
groups = c(rep("Group1",50), rep("Group2", 50), rep("Group3", 50))
data=data.frame(c(group1, group2, group3),groups)
colnames(data)=c("obs","groups")
ggplot(data, aes(x=obs, fill=groups)) + geom_boxplot()
# sample group means
mean(group1); mean(group2); mean(group3)
## [1] 250.4002
## [1] 242.5958
## [1] 249.6252
Question 4: Think about how much variability there is within each group, i.e., how much observations vary from each other within group 1, withing group 2, and within group 3 respectively. Are the differences in group means large or small relative to how much variability there is within groups. Explain.
Differences in group means are not much, but the variane in each group is huge!
Question 5: Utilizing your answer from Question 4,
do you think the F-statistic from an ANOVA test will give us a large or
small p-value for testing if these means are different? Explain. Check
your answer using the aov()
and summary
functions with the data
object.
I think the p-value will be large. I should get a small F-value because MSG should be small and MSE should be large.
aov(obs~groups, data) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## groups 2 1849 924.3 0.705 0.496
## Residuals 147 192847 1311.9
We are going to do the same thing as above, but with smaller variances.
We are simulating 50 observations from each of 3 groups. All have the same population std. dev. \(\sigma=4\). The corresponding population means are \(\mu_1 = 250\), \(\mu_1 = 245\), and \(\mu_1 = 255\).
Note: The means haven’t changed from the last example, only the variances. Let’s see how that affects the ANOVA test.
set.seed(423)
group1 = rnorm(50, mean=250, sd=4)
group2 = rnorm(50, mean=245, sd=4)
group3 = rnorm(50, mean=255, sd=4)
groups = c(rep("Group1",50), rep("Group2", 50), rep("Group3", 50))
data2=data.frame(c(group1, group2, group3),groups)
colnames(data2)=c("obs","groups")
ggplot(data2, aes(x=obs, fill=groups)) + geom_boxplot() +
xlim(235, 260)
# sample group means
mean(group1); mean(group2); mean(group3)
## [1] 250.04
## [1] 244.7596
## [1] 254.4625
Question 6: Think about how much variability there is within each group for this example. Are the differences in group means large or small relative to how much variability there is within groups. Explain. Is it easier to see differences in the groups compared to the previous example?
Differences in means are larger relative to how little variability there is in the groups. It is easier to see the differences.
Question 7: Do you think the F-statistic from an
ANOVA test will be larger or smaller than the previous example? Check
your answer using the aov()
and summary
functions with the data2
object.
Larger.
Question 8: In the ANOVA output for Question 7, which Sum of Squares corresponds to within group variability? How does this Sum of Squares compare for the 2 different scenarios we just looked at?
aov(obs~groups, data2) %>% summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## groups 2 2360 1179.9 89.94 <2e-16 ***
## Residuals 147 1928 13.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The following plot uses the data2
simulated data from
above. The only difference is that I have changed the x-axis scale.
ggplot(data2, aes(x=obs, fill=groups)) + geom_boxplot() +
xlim(150, 350)
Question 9: Does this change your answer to whether or not you can visually tell the groups apart? Did it become more difficult?
The fact that changing the scale can affect how we perceive whether the groups are different means we should be wary of just visually looking at graphs to determine differences. The F-test circumvents this by comparing how different the group means are relative to how ‘wide’ the distributions are (variability within groups) so that we are not lead astray by our eyes.