Question 1

Part A Create a frequency table using the titanic data set to find how many children and adults were on board the Titanic.

with(titanic, table(Age))
## Age
## Child Adult 
##   109  2092

Part B Determine what percentage of the passengers on-board the Titanic were adults.

with(titanic, table(Age)) %>% proportions()
## Age
##      Child      Adult 
## 0.04952294 0.95047706

The percentage of passengers that were adults is 95.0%. You could also have used the previous table to get .95 = \(\frac{2092}{2092 + 109}\).

Part C Determine what percentage of the passengers on-board the Titanic were members of the crew.

with(titanic, table(Class)) %>% proportions()
## Class
##       1st       2nd       3rd      Crew 
## 0.1476602 0.1294866 0.3207633 0.4020900

40.2% of passengers were members of the crew.


Question 2

Part A How many children were included in second class?

with(titanic, table(Age, Class))
##        Class
## Age     1st 2nd 3rd Crew
##   Child   6  24  79    0
##   Adult 319 261 627  885

24 children.

Part B What percentage of the crew survived? How about children?

with(titanic, table(Survived, Class)) %>% proportions(margin = 2)
##         Class
## Survived       1st       2nd       3rd      Crew
##      No  0.3753846 0.5859649 0.7478754 0.7604520
##      Yes 0.6246154 0.4140351 0.2521246 0.2395480
with(titanic, table(Survived, Age)) %>% proportions(margin=2)
##         Age
## Survived     Child     Adult
##      No  0.4770642 0.6873805
##      Yes 0.5229358 0.3126195

24% of the crew survived. 52% of children survived. These are conditional probabilities since we are narrowing our focus to one group for each table (crew and children). We need survived and not survived to add up to 100% for each group. The way I arranged my tables with survived on the y-axis means I needed to use margins=2 argument in the proportions() function.

Part C What proportion of individuals who survived were members of the crew? Construct the plot associated with the table you create.

with(titanic, table(Survived, Class)) %>% proportions(margin=1) %>% addmargins(2)
##         Class
## Survived        1st        2nd        3rd       Crew        Sum
##      No  0.08187919 0.11208054 0.35436242 0.45167785 1.00000000
##      Yes 0.28551336 0.16596343 0.25035162 0.29817159 1.00000000

The proportion of passengers who survived that were members of the crew is .298. This is a conditional probability since we are restricting ourselves to only looking at the Survived=Yes row. We need the rows to add to 100% so I used proportions(margin=1).


Question 3

ggplot(titanic, aes(Class)) + 
  geom_bar() + 
  facet_grid(Survived ~ Sex)

Part A: Amongst female passengers, which class had the most who did not survive? How many female passengers in this class did not survive?

3rd class. 106 female passengers in 3rd class did not survive.

Part B: Amongst male passengers, which class had the fewest people survive? How many male passengers in this class survived?

2nd class. 25 male passengers in 2nd class survived.


Question 4

Part A: Write the code to produce tables displaying information for the two plots of college data (Region and Type)

# Boxplot 1
with(college, table(Region, Type)) %>% proportions(margin = 1) %>% addmargins(2)
##                  Type
## Region              Private    Public       Sum
##   Far West        0.5673077 0.4326923 1.0000000
##   Great Lakes     0.6613757 0.3386243 1.0000000
##   Mid East        0.6363636 0.3636364 1.0000000
##   New England     0.6197183 0.3802817 1.0000000
##   Plains          0.6666667 0.3333333 1.0000000
##   Rocky Mountains 0.2666667 0.7333333 1.0000000
##   South East      0.5563140 0.4436860 1.0000000
##   South West      0.4523810 0.5476190 1.0000000
# Boxplot 2
with(college, table(Region, Type)) %>% proportions(margin = 2) %>% addmargins(1)
##                  Type
## Region               Private     Public
##   Far West        0.09119011 0.10044643
##   Great Lakes     0.19319938 0.14285714
##   Mid East        0.19474498 0.16071429
##   New England     0.06800618 0.06026786
##   Plains          0.12982998 0.09375000
##   Rocky Mountains 0.01236476 0.04910714
##   South East      0.25193199 0.29017857
##   South West      0.05873261 0.10267857
##   Sum             1.00000000 1.00000000

Part B: Using the appropriate graph and table, do any regions have public schools as a majority and if so what percent of schools in that region are public?

# Rocky Mountains (73%) and South West (54%)

Part C: Using the appropriate graph and table, amongst public schools, which region has the largest percentage (also give me that percentage).

# South East (29%)