library(ggplot2)
library(dplyr)

theme_set(theme_bw())

## Better histograms
gh <- function(bins = 10) {
  geom_histogram(color = 'black', fill = 'gray80', bins = bins)
}

Introduction

This lab is oriented around the use of the aov() function in R to fit an analysis of variance model.

One-way ANOVA

The first ANOVA we will consider is that which compares equality of means across several groups. To begin, we’ll consider the dog dataset that was introduced in class.

dogs <- read.csv("https://collinn.github.io/data/dogs.csv")

From this dataset, we find that we have one quantitative variable (speed), along with three categorical variables for breed, size, and color. We can quickly get a summary of these categories with the table function

with(dogs, table(color))

## color
##  black  brown  white yellow 
##    200     50     50    100

Here, we see that the color variable has 4 categories with 400 observations. To conduct an ANOVA, we will use the same formula syntax that was used for the two sample t-test, with our quantitative variable on the left hand side and the categories on the right. We’ll do this using aov()

aov(speed ~ color, dogs)

## Call:
##    aov(formula = speed ~ color, data = dogs)
## 
## Terms:
##                    color Residuals
## Sum of Squares   1651.64  50746.34
## Deg. of Freedom        3       396
## 
## Residual standard error: 11.32022
## Estimated effects may be unbalanced

The basic output of this function only gives us limited information related to the sums of squares and degrees of freedom. We can recreate the tables we saw in lecture by passing the output of aov() into a summary() function, either with a pipe (%>%) or by assigning the output to a new variable and calling summary on it

## Method 1
# model <- aov(speed ~ color, dogs)
# summary(model)

## Method 2
aov(speed ~ color, dogs) %>% summary()

##              Df Sum Sq Mean Sq F value  Pr(>F)   
## color         3   1652   550.5   4.296 0.00533 **
## Residuals   396  50746   128.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, we now get the mean sums of squares for our groups and residuals, along with an F statistic and an associated p-value. In particular, we know that this statistic follows an F distribution with \(F(3, 396)\) degrees of freedom. The \(p\)-value would be the area under the curve to the right of the red line below:

Question 1: Using the dogs dataset, recreate a boxplot showing the speeds of each dog, separated by color. Based on this plot, does it appear as if there is a statistically significant (\(\alpha = 0.05\)) difference in group means? Compare your conclusion with the ANOVA output done above. Explain what might be causing you to reject the null hypothesis in this case.

ggplot(dogs, aes(x=speed, y=color)) + geom_boxplot()

No, not really by visuals. The Yellow and White groups look like they may have different centers based on the sample data.

Question 2: The data for this question involves sandwiches and ants. An experiment was conducted in which sandwiches were made with various combinations of filling and bread, both with and without butter added. A piece of the sandwich was ripped off and left near an ant hill for several minutes, at which point a jar was placed over the sandwich and the number of ants surrounding it were counted.

sandwich <- read.csv("https://collinn.github.io/data/sandwich.csv")

Suppose that you wanted to choose one variable (Butter or Filling) to predict how many ants will be attracted to a sandwich. Which do you think would best accomplish this goal? Justify your answer with a plot and a statistical test.

ggplot(sandwich, aes(x=Ants, y=Butter))+geom_boxplot()

ggplot(sandwich, aes(x=Ants, y=Filling))+geom_boxplot()

aov(Ants~Butter, sandwich) %>% summary()

##             Df Sum Sq Mean Sq F value Pr(>F)   
## Butter       1   2055  2055.1   7.872 0.0065 **
## Residuals   70  18275   261.1                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

aov(Ants~Filling, sandwich) %>% summary()

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Filling      2   7645    3823   20.79 8.56e-08 ***
## Residuals   69  12685     184                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value for using Filling as a predictor is much lower. We can visually see a difference between means as judged by boxplots of the samples. Ham and Pickles has way more ants compared to the other groups.

Post-Hoc Testing

As noted previously, ANOVA is specifically concerned with testing the null hypothesis of equality between means for multiple groups,

\[ H_0: \mu_1 = \mu_2 = \dots = \mu_k \] Should we perform an ANOVA and reject our null hypothesis, we only know that at least two of our group means are different. Post-hoc pairwise testing (Latin for “after this” or “after the fact”) can be done to determine which of our pairwise differences are likely responsible.

Consider again our dog dataset in which we wish to test for equality in average speed between different colored dogs. This is done simply with the aov() function

## This will assign the results to a variable called model
model <- aov(speed ~ color, dogs)
summary(model)

##              Df Sum Sq Mean Sq F value  Pr(>F)   
## color         3   1652   550.5   4.296 0.00533 **
## Residuals   396  50746   128.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, we see information on the squared error from the grouping and our residuals, along with an F-statistic and a p-value. If we were testing at the \(\alpha = 0.05\) level, we would reject this test as \(p-val = 0.0053\).

To determine which pairwise colors had a difference, we can use the TukeyHSD() function (Tukey honest statistical difference) on the model object we created above:

## Pass in output from aov() function
comp <- TukeyHSD(model)
comp

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = speed ~ color, data = dogs)
## 
## $color
##                   diff         lwr        upr     p adj
## brown-black   1.961210  -2.6566404  6.5790606 0.6923676
## white-black   4.236027  -0.3818238  8.8538772 0.0852883
## yellow-black -2.396760  -5.9737312  1.1802120 0.3101186
## white-brown   2.274817  -3.5663536  8.1159868 0.7467192
## yellow-brown -4.357970  -9.4165715  0.7006321 0.1188852
## yellow-white -6.632786 -11.6913881 -1.5741845 0.0043711

There are a few things to note here:

First, we see that it gives us a point estimate of the difference in means as the first column in the output
Next, we get a confidence interval for the difference for lower and upper bounds. By default, this is a 95% confidence interval, but we can change this in the TukeyHSD() function by passing in an argument for conf.level
Finally, we see that the last column gives us an adjusted p-value. That is, rather than adjusting \(\alpha^* = \alpha/3\) and comparing the original p-values, it adjust the p-values that we can compare with our regular \(\alpha\). In either case, the conclusions that we should come to will be the same.

From this output, we see the only statistically significant difference in between yellow and white.

Finally, we can plot the output from the TukeyHSD() function with a call to the base R function plot()

## Pass in output from TukeyHSD() function
plot(comp)

Note here again, the only confidence interval that does not contain 0 (our null hypothesis for pairwise tests) is that between yellow and white, consistent with the output we observed above.

Question 3: An individual’s “critical clicker frequency” is the highest frequency at which a flickering light source can actually be detected to be flickering; at frequencies above this rate, the light source will appear continuous. EPILEPSY WARNING (An example of this can be seen here)

The data below come from a study titled “The effect of iris color on critical flicker frequency” published in the Journal of General Psychology, recording the critical clicker frequency and iris color for \(n = 19\) subjects

flicker <- read.delim('https://raw.githubusercontent.com/IowaBiostat/data-sets/main/flicker/flicker.txt')

Part A: What are the null and alternative hypotheses for one-way ANOVA analyzing this data

Null: there is no difference between mean frequencies for the colors Alternative: at least one color has a different mean frequency than the others

Part B: Use the aov() function to analyze this data and print a summary using summary()

model2 = aov(Flicker ~ Color, flicker) 
model2 %>% summary()

##             Df Sum Sq Mean Sq F value Pr(>F)  
## Color        2  23.00  11.499   4.802 0.0232 *
## Residuals   16  38.31   2.394                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part C: Does Color or the Residuals have a larger sum of squares? Which has a larger mean square? What accounts for this discrepancy?

Residuals has a larger sum of squares. Color has a larger mean square. This is because the groups have different df, Color has many less to divide by.

Part D: Perform post-hoc testing to determine which pairwise groups have a statistically significant difference between them.
Part E: Of the groups you found to have statistically significant difference in means in Part D, which group appears to have the larger mean?

More on F-tests

The following code simulations a few situations to help us better understand what some components of the ANOVA table represent. We are going to also see what changing group variances does to the F-statistic and p-value for the ANOVA test.

Simulated Groups with Large Variances

Here we are simulating 50 observations from each of 3 groups. All have the same population std. dev. \(\sigma=40\). The corresponding population means are \(\mu_1 = 250\), \(\mu_1 = 245\), and \(\mu_1 = 255\). The group means are not equal, but the variances are relatively large which we can see by the following plot.

set.seed(423)
group1 = rnorm(50, mean=250, sd=40)
group2 = rnorm(50, mean=245, sd=40)
group3 = rnorm(50, mean=255, sd=40)
groups = c(rep("Group1",50), rep("Group2", 50), rep("Group3", 50))

data=data.frame(c(group1, group2, group3),groups)
colnames(data)=c("obs","groups")

ggplot(data, aes(x=obs, fill=groups)) + geom_boxplot()

# sample group means
mean(group1); mean(group2); mean(group3)

## [1] 250.4002

## [1] 242.5958

## [1] 249.6252

Question 4: Think about how much variability there is within each group, i.e., how much observations vary from each other within group 1, withing group 2, and within group 3 respectively. Are the differences in group means large or small relative to how much variability there is within groups. Explain.

Differences in group means are not much, but the variane in each group is huge!

Question 5: Utilizing your answer from Question 4, do you think the F-statistic from an ANOVA test will give us a large or small p-value for testing if these means are different? Explain. Check your answer using the aov() and summary functions with the data object.

I think the p-value will be large. I should get a small F-value because MSG should be small and MSE should be large.

aov(obs~groups, data) %>% summary()

##              Df Sum Sq Mean Sq F value Pr(>F)
## groups        2   1849   924.3   0.705  0.496
## Residuals   147 192847  1311.9

Simulated Groups with Small Variances

We are going to do the same thing as above, but with smaller variances.

We are simulating 50 observations from each of 3 groups. All have the same population std. dev. \(\sigma=4\). The corresponding population means are \(\mu_1 = 250\), \(\mu_1 = 245\), and \(\mu_1 = 255\).

Note: The means haven’t changed from the last example, only the variances. Let’s see how that affects the ANOVA test.

set.seed(423)
group1 = rnorm(50, mean=250, sd=4)
group2 = rnorm(50, mean=245, sd=4)
group3 = rnorm(50, mean=255, sd=4)
groups = c(rep("Group1",50), rep("Group2", 50), rep("Group3", 50))

data2=data.frame(c(group1, group2, group3),groups)
colnames(data2)=c("obs","groups")

ggplot(data2, aes(x=obs, fill=groups)) + geom_boxplot() +
  xlim(235, 260)

# sample group means
mean(group1); mean(group2); mean(group3)

## [1] 250.04

## [1] 244.7596

## [1] 254.4625

Question 6: Think about how much variability there is within each group for this example. Are the differences in group means large or small relative to how much variability there is within groups. Explain. Is it easier to see differences in the groups compared to the previous example?

Differences in means are larger relative to how little variability there is in the groups. It is easier to see the differences.

Question 7: Do you think the F-statistic from an ANOVA test will be larger or smaller than the previous example? Check your answer using the aov() and summary functions with the data2 object.

Larger.

Question 8: In the ANOVA output for Question 7, which Sum of Squares corresponds to within group variability? How does this Sum of Squares compare for the 2 different scenarios we just looked at?

aov(obs~groups, data2) %>% summary()

##              Df Sum Sq Mean Sq F value Pr(>F)    
## groups        2   2360  1179.9   89.94 <2e-16 ***
## Residuals   147   1928    13.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Affect of Axis scale on how we perceive differences

The following plot uses the data2 simulated data from above. The only difference is that I have changed the x-axis scale.

ggplot(data2, aes(x=obs, fill=groups)) + geom_boxplot() +
  xlim(150, 350)

Question 9: Does this change your answer to whether or not you can visually tell the groups apart? Did it become more difficult?

The fact that changing the scale can affect how we perceive whether the groups are different means we should be wary of just visually looking at graphs to determine differences. The F-test circumvents this by comparing how different the group means are relative to how ‘wide’ the distributions are (variability within groups) so that we are not lead astray by our eyes.