The goal is to replicate the below visualization given below as closely as possible. This visuals can be made using techniques covered in our labs or homework. This is an example of how I think about recreating figures, there are plenty of other methods.
My general approach is consists of 3 steps:
Data:
diamonds_data=diamonds
Plot:
GoalPlot
\(~\)
\(~\)
Based on the plot we can see the following:
Next I import the dataset and check the first few values:
diamonds_data=diamonds
head(diamonds_data,10)
## # A tibble: 10 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39
From this data, it looks like the word “Fair” reffers to the cut, and the letters refer to the color. If we filter the dataset by those values, we get the following:
diamonds2 = diamonds_data%>%
filter(cut=='Fair')%>%
filter(clarity %in% c('VVS2','VVS1','IF'))%>%
filter(color %in% c('D','H','I','J'))
dim(diamonds2)
## [1] 38 10
38 diamonds is similar to our approximation of 40, so this is probably the dataset used in the plot.
\(~\)
Using this we can generate the following graph:
ggplot(diamonds2,aes(x=price,fill=clarity))+
geom_histogram(position = "stack", alpha = 0.5, bins = 2) +
facet_wrap(~color)
This plot looks very similar to the original plot, but now we need to work on the fine details.
\(~\)
The differences between the graphs seem to be related to the following: - Axes - Both axis titles are different - Both axis values are different - Colors/Fill - Colors are different, look like red, blue, green - Theme - Different theme - Title - Our plot does not have a title
Start by fixing the colors:
ggplot(diamonds2,aes(x=price,fill=clarity))+
geom_histogram(position = "stack", alpha = 0.5, bins = 2) +
facet_wrap(~color)+
scale_fill_manual(values=c("red","blue","green"))
Since that looks right, let’s work on the axes and title. Both price and
count are continuous variables.
Making these changes gives us:
ggplot(diamonds2,aes(x=price,fill=clarity))+
geom_histogram(position = "stack", alpha = 0.5, bins = 2) +
facet_wrap(~color)+
scale_fill_manual(values=c("red","blue","green"))+
scale_y_continuous(name="Count of Diamonds",breaks=c(0,1,5,10,15))+
scale_x_continuous(name="Price",breaks=c(0,5000,15000))+
ggtitle("Prices of Fair Diamonds")
From here we see that:
Making these changes:
ggplot(diamonds2,aes(x=price,fill=clarity))+
geom_histogram(position = "stack", alpha = 0.5, bins = 2) +
facet_wrap(~color)+
scale_fill_manual(values=c("red","blue","green"))+
scale_y_continuous(name="Count of Diamonds",breaks=c(0,1,5,10,15))+
scale_x_continuous(name="Price",breaks=c(0,5000,15000))+
ggtitle("Prices of Fair Diamonds")+
theme_minimal()+
theme(plot.title = element_text(hjust = 0.5))
This looks almost right, however we still have extra gridlines which makes it harder to read (where is “1” for example). Since these lines don’t all line up with the axis, but the ones in the actual plot do, we probably need to remove the minor gridlines.
ggplot(diamonds2,aes(x=price,fill=clarity))+
geom_histogram(position = "stack", alpha = 0.5, bins = 2) +
facet_wrap(~color)+
scale_fill_manual(values=c("red","blue","green"))+
scale_y_continuous(name="Count of Diamonds",breaks=c(0,1,5,10,15))+
scale_x_continuous(name="Price",breaks=c(0,5000,15000))+
ggtitle("Prices of Fair Diamonds")+
theme_minimal()+
theme(plot.title = element_text(hjust = 0.5))+
theme(panel.grid.minor = element_blank())
And We’ve recreated the original plot!
GoalPlot