Directions

For this assignment you should record your answers in an R Markdown file and submit the compiled output as a pdf.

Homework #3 is due Friday 9/26 by 10:00pm

Question #1

The data frame diamonds is contained within the ggplot2 package. These data record the attributes of several thousand diamonds sold by a wholesale online retailer. For this question, your goal is to recreate the graph shown below as closely as possible. A few hints:

library(ggplot2)  # Make sure you have this package installed
data("diamonds") # loads the built-in "diamonds" data set

\(~\)

Question #2 - Part A

The “babynames” package contains a data set documenting the number and frequency of all names that appear at least 5 times within a given year as recorded by the United States Social Security Administration.

The code below will load this dataset. You will likely need to install the package unless you’ve previously used it.

#install.packages("babynames")
library(babynames)
## Warning: package 'babynames' was built under R version 4.5.1
data("babynames")

Create a subset of babynames named my_subset that contains information on the names: "Ryan", "Jeff", "Shonda", "Jonathan", "Collin", "Nathan", "William"

Next, run the ggplot code given below, which seems like it should create a line chart of each name’s frequency by year. What is happening that makes this graph look so horrible? Take a look at the data frame and explain the issue in 1-2 sentences.

ggplot(my_subset, aes(x = year, y = n, color = name)) + geom_line() 

\(~\)

Question #2 - Part B

Create a new graph that fixes the problem you identified in Part A and appopriately displays the frequency of each name over time.

\(~\)

Question #3

Consider the following ggplot code:

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))

Part A: Why are the data-points in the scatter plot not colored blue? Briefly explain.

Part B: Modify the code so that the points are properly shown as blue.

## Put your code for 2-B here

\(~\)