Part A: Explanatory: Percent with Bachelor’s degree, Response: Per capita income
Part B: There is a moderately strong linear relationship between counties’ percent with bachelor’s degree and per capita income. There are a few outliers with large values in both variables.
Part C: No. There are two particular issues here. One: this relationship is not causal, two: this would be an ecological fallacy as county relationships are not always the same as individual relationships.
Part A: There is a non-linear, but moderate and positive relationship between percent internet users of a country and the life expectancy. There are no obvious outliers.
Part B: This is an observational study.
Part C: There are many different options. One is wealth.
Part A: Footprint is explanatory, happiness is response. We are trying to use footprint to predict happiness.
Part B: There is a moderately strong, positive, non-linear relationship between these variables. There are no extreme outliers.
Part C: No, pearson’s correlation is not appropriate. It measures linear relationships, but this relationship is not linear.
theme_set(theme_bw())
Happy %>% filter(Region == "1") %>% ggplot(aes(x=Footprint, y=Happiness)) + geom_point()
Region 1 corresponds to South American countries. There is no linear relationship between these variables for South American countries.
Part A: Life Expectancy
Part B:
Happy %>% ggplot(aes(x=LifeExpectancy, y=Happiness)) + geom_point()
There is a moderately strong positive linear relationship between life expectancy and happiness of a country. There are no outliers. Pearson’s correlation is OK because the relationship is linear.
Part C: It is possible to get large correlation values with non-linear relationships. This would imply the relationship is linear if you blindly apply the correlation interpretation (no bueno).