library(ggplot2)
library(dplyr)
data = read.table("https://nfriedrichsen.github.io/data/HappyPlanetIndex.txt", header=T, sep=",")
Happy = data.frame(data)
Happy$Region = as.factor(Happy$Region)
Happiness
with Footprint
.# Scatterplot of Footprint vs Happiness
theme_set(theme_bw())
ggplot(Happy, aes(x=Footprint, y=Happiness)) + geom_point()
Question 1: Which is the explanatory variable and which is the response variable? Explain.
The response is Happiness because that is what we are predicting. We are using Footprint to predict Happiness so it is the explanatory variable.
Question 2: Describe the relationship between the happiness of a country and it’s ecological footprint.
The relationship between happiness and ecological footprint is moderately strong, positive, and curved.
Question 3: Is it appropriate to use Pearson’s correlation to quantify the relationship between these two variables? Explain.
No. Pearson’s correlation describes a linear relationship but the relationship isn’t linear.
Happy %>% select(Happiness, LifeExpectancy, GDPperCapita) %>%
cor(use="complete.obs")
## Happiness LifeExpectancy GDPperCapita
## Happiness 1.0000000 0.8334278 0.6976830
## LifeExpectancy 0.8334278 1.0000000 0.6662072
## GDPperCapita 0.6976830 0.6662072 1.0000000
Question 4: Of the two other variables in the correlation matrix, which has the strongest correlation with Happiness?
Life Expectancy.
Happiness
with
LifeExpectancy
# Scatterplot of Happiness vs. LifeExpectancy
ggplot(Happy, aes(x=LifeExpectancy, y=Happiness)) + geom_point()
Question 5: Which of these is the explanatory and which is the response variable? Explain.
Happiness is the response since that is what we are predicting, life expectancy is the explanatory because we are using it to predict.
Question 6: Describe the general relationship
between LifeExpectancy
and Happiness
.
There is a positive moderately strong linear relationship between Life Expectancy and Happiness.
Question 7: Is it appropriate to use Pearson’s correlation to quantify the relationship between these two variables? Explain.
Yes. The relationship is linear, so Pearson’s correlation can be used.
Question 8: State the value of the correlation
between LifeExpectancy
and Happiness
.
Interpret the value of this correlation.
The correlation is 0.833. It tells us that the linear relationship between the variables is positive and strong.
# Linear regression for HDI vs Happiness.
fit = lm(data=Happy, Happiness~LifeExpectancy)
fit
##
## Call:
## lm(formula = Happiness ~ LifeExpectancy, data = Happy)
##
## Coefficients:
## (Intercept) LifeExpectancy
## -1.1037 0.1035
# Plot regression on scatterplot
ggplot(Happy, aes(x=LifeExpectancy, y=Happiness)) + geom_point() +
geom_smooth(method='lm', se=F) +
geom_label(x=55, y=8, label = paste("Predicted Happiness = -1.104 + 0.104*LifeExpectancy"))
Question 9: State the regression equation using the variable names.
Predicted Happiness = -1.104 + 0.104 \(\times\) LifeExpectancy.
Question 10: What is the value of the slope? Interpret the value of the slope in context.
The slope value is 0.104. For every 1-year increase in life expectancy, the predicted Happiness increases by 0.104.
Question 11: What is the value of the intercept? Would it be appropriate to interpret the y-intercept? If yes, interpret the value of the y-intercept. If not, explain why.
The value of the intercept is -1.104. It is not appropriate to interpret because happiness can’t be negative and LifeExpectancy for a country cannot reasonably be zero.
Question 12: What is the predicted happiness for a country that has a life expectancy of 77.9 years? Show your calculation.
-1.104 + 0.104*(77.9)
## [1] 6.9976
Question 13: What is the value of the residual for
the United States? Interpret the value of the residual. The value of the
US’s LifeExpectancy
and Happiness
variables
are:
Happy %>% filter(Country == "United States of America") %>% select(LifeExpectancy, Happiness)
## LifeExpectancy Happiness
## 1 77.9 7.9
# residual: e = y - y_hat = observed - predicted Happiness
7.9 - 6.9976
## [1] 0.9024
We have under-predicted the US’s happiness by 0.9024.
Question 14: What is the value of the coefficient of determination (R^2) between the happiness of a country and its life expectancy? Interpret the value of R^2 (do not use correlation interpretation).
cor(Happy$Happiness, Happy$LifeExpectancy)^2
## [1] 0.6928597
We can square the correlation value we got earlier to get \(R^2 = 0.693\). 69.3% of variation in Happiness values for countries can be explained with our regression model using Life Expectancy as our predictor.