This assignment has a total of 23 pts possible. Your score out of 20 will noted and scaled to 5 points (maximum of 5).
Question 1 – Conceptual Questions: (2pts each)
Part A What does it mean to say two variables are associated with each other?
Part B What does it mean to say two variables are independent of each other?
Part C What does the distribution of a variable tell us?
Question 2 For this question, we will be using the
iris
dataset, giving the measurements, in centimeters, of
the variables for sepal and petal length and width. You can read more on
the dataset here. An
image of what these variables correspond to on a flower are provided
below.
NOTE: You will need to edit out the image from your document in order for it to knit to a .pdf doc. If your doc does not knit, this may be the cause.
To load this data into R, simply copy and paste the following into your Rmd file in an R code chunk
library(ggplot2)
data(iris)
Use this data to answer the following questions:
Part A How many observations and variables are
in the iris
dataset? In one sentence, briefly describe what
constitutes an observation in this data. (2 pts)
Part B Use the code below to create the
appropriate plot to visualize the relationship between the variables
Sepal.Width
and Sepal.Length
. Do these two
variables appear to be associated? If so, comment on the
strength of this association. (2 pt)
ggplot(iris, aes(Sepal.Width, Sepal.Length)) + geom_point()
Species
. Has anything changed in the association between
Sepal.Width
and Sepal.Length
? Comment on the
strength, form, and
direction of any associations you see (1pt)ggplot(iris, aes(Sepal.Width, Sepal.Length, color = Species)) + geom_point()
Question 3:
From the IMS Textbook, do the following exercises (you do not need to read anything from the textbook to answer these):
Write your answers to these exercises below:
IMS Ch 4.8, #5 (3pts)
IMS Ch 4.8, #6 (4pts)
IMS Ch 5.10, #1 (3pts)
IMS Ch 5.10, #2 (2pts)