Proportions


Political Affiliation and Immigration

The dataset below contains the results from a poll based on a random sample with two variables: response, indicating their response to the poll question, and political, reporting their self-reported political ideology.

A number of randomly sampled registered voters from Tampa, FL were asked if they thought workers who have illegally entered the US should be (i) allowed to keep their jobs and apply for US citizenship, (ii) allowed to keep their jobs as temporary guest workers but not allowed to apply for US citizenship, or (iii) lose their jobs and have to leave the country.

## Copy and run this code to create table
library(ggplot2)
library(dplyr)
immigration <- read.csv("https://collinn.github.io/data/immigrationpoll.csv")

with(immigration, table(response, political)) %>% addmargins(1)
##                        political
## response                conservative liberal moderate
##   Apply for citizenship           57     101      120
##   Guest worker                   121      28      113
##   Leave the country              179      45      126
##   Not sure                        15       1        4
##   Sum                            372     175      363

Question 1

We will make a confidence interval to answer the question “What proportion of conservative Tampa voters support workers being ‘allowed to keep their jobs and apply for US citizenship.’

Part A: Describe the parameter of interest, including which symbol we use for it.

Answer: p = proportion of conservative Tampa voters that support ‘workers being allowed to keep their jobs and apply for US citizenship’

Part B: What is the corresponding value of the statistic and what symbol do we use for it?

# p-hat
57 / 372
## [1] 0.1532258

Part C: What is the sample size for the group of conservatives?

# n
372

Part D: Check the conditions for making a confidence interval.

# Random sample: met
# Success condition: 57 successes (met)
# Failure condition: 315 failures (met)
372 - 57
## [1] 315

Part E: Create a 95% confidence interval for the parameter.

p_hat = (57/352)
n = 372
p_hat - 1.96 * sqrt(p_hat * (1-p_hat) / n)
## [1] 0.1244957
p_hat + 1.96 * sqrt(p_hat * (1-p_hat) / n)
## [1] 0.1993679

Part F: Interpret the confidence interval.

Answer: We are 95% confident that the true proportion of conservative Tampa voters that are in favor of citizenship options for workers that have illegally entered the US is between .124 and .199. Somewhere between roughly 1 in 5 or 1 in 8 voters.

(Alternative) We are 95% confident that the true percentage of conservative Tampa voters that are in favor of citizenship options for workers that have illegally entered the US is between 12.4% and 19.9%. Somewhere between roughly 1 in 5 or 1 in 8 voters.

Question 2

Let’s see if there is a difference between conservatives and liberals in terms of proportions that support workers being ‘allowed to keep their jobs and apply for US citizenship.’

Part A: What is the value of the statistic of interest? (Make ‘Liberal’ the first group)

#p-hat_L - p-hat_C
(101/175) - (57/372)
## [1] 0.4239171

Part B: Check the conditions to make a confidence interval.

# Random samples: met
# Independent groups: met 
# Success condition (Liberal): 101 successes (met)
# Failure condition (Liberal): 175-101 = 74 failures (met)
# Success condition (Conservative): 57 successes (met)
# Failure condition (Conservative): 372-57 = 315 failures (met)

Part C: Make a 90% confidence interval.

p1 = (101/175)
p2 = (57/372)
n1 = 175
n2 = 372
qnorm(.95)
## [1] 1.644854
(p1 - p2) - 1.645 * sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
## [1] 0.3552326
(p1 - p2) + 1.645 * sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
## [1] 0.4926015

Part D: Interpret the confidence interval. Include a sentence summary for someone who has no stats background.

Answer: We are 95% confident that the true percentage of liberal Tampa voters is between 35.5% and 49.2% higher than conservative Tampa voters. According to this survey, it looks like liberal Tampa voters are much more likely to support a citizenship option than conservative voters.

Part E: According to the CI, is it plausible there is no difference between the groups?

Answer: No. The entire confidence interval positive. Zero is nowhere close to our interval.


Supercommuters

The fraction of workers who are considered “supercommuters”, because they commute more than 90 minutes to get to work, varies by state. Suppose the 1% of Nebraska residents and 6% of New York residents are supercommuters. Now suppose that we plan a study to survey 1000 people from each state, and we will compute the sample proportions \(\hat{p}_{NE}\) for Nebraska and \(\hat{p}_{NY}\) for New York.

  1. What is the associated mean and standard deviation of \(\hat{p}_{NE}\) in repeated samples of size 1000?

Answer: These questions are talking about a sampling distribution in different words. The associated mean will be 0.01 and standard deviation will be

sqrt(.01*.99 / 1000)
## [1] 0.003146427

in repeated samples of size 1000.

  1. What is the associated mean and standard deviation of \(\hat{p}_{NY}\) in repeated samples of size 1000?

The associated mean will be 0.06 and standard deviation will be

sqrt(.06*.94 / 1000)
## [1] 0.007509993

in repeated samples of size 1000.

  1. Calculate and interpret the mean and standard deviation associated with the difference in sample proportions for the two groups, \(\hat{p}_{NY} - \hat{p}_{NE}\) in repeated samples of 1000 in each group.

The associated mean will be 0.05 (because 0.06 - 0.01 = 0.05) and standard deviation will be

sqrt(.06*.94 / 1000 + .01*.99/1000)
## [1] 0.008142481

in repeated samples of size 1000.

The mean of the differences in this distribution will be roughly 0.05. The standard deviation value tells us that on average samples will result in \(\hat{p}_{NY} - \hat{p}_{NE}\)’s that are 0.008 away from this mean.

  1. How are the standard deviations from parts (a), (b), and (c) related?

\(SD_{\hat{p}_{NY} - \hat{p}_{NE}}^2 = SD_{\hat{p}_{NY}}^2 + SD_{\hat{p}_{NE}}^2\)

\(Var(\hat{p}_{NY} - \hat{p}_{NE}) = Var(\hat{p}_{NY}) + Var(\hat{p}_{NE})\)

The variability of the difference is the sum of variability in the individual groups.