This project is an opportunity for you to set up and conduct your own statistical analysis. This will involve generating a research question, collecting data, and then performing a written analysis along side an in-class presentation (optional, extra credit).
During our projects, we should aim to demonstrate the following competencies:
Wednesday, April 16 at 10pm – Email me members of group (2-3).
Friday, April 25 at 10pm – Submit a 1-2 paragraph proposal that briefly describes the following:
Friday, May 2 at 10pm – A PDF documenting an exploratory analysis due. This should include
Wednesday + Friday, May 5-7 – Presentations during scheduled class time
Friday, May 16 at 5pm – Final report is due
The first part of this project involves establishing a research question (framed as a hypothesis) and a plan on how to answer it. This will involve clearly defining a population of interest, articulating an unambiguous hypothesis about this population, and then establishing a plan to go about investigating it. Study design will be an important aspect of this project, and you will want to be sure to collect as much relevant information about your research question as possible, especially if there are any confounding variables that might influence your results.
Our priority for this project is to find a research question and collected data pertaining to Grinnell. That is, to the extent possible, I will encourage everyone to go about collecting data yourself rather than finding data from an external website (such as Kaggle). Your data should contain a minimum of three variables, though it is preferable that you find more if you can. Additionally, you should aim to have at least 30 observations in your dataset.
The motivation for hands-on data collection is two-fold. First, this provides an opportunity to consider different aspects and limitations related to study design. Second, and perhaps more compelling, is that it is instructive to see just how broadly the tools we have learned in class can be used to quantify information in the world.
Examples of the type of studies and data you might collect, consider:
Collecting leaves, where you measure
Determine if people prefer to buy prepacked or behind-the-counter meat
Blind taste test (i.e., differentiate taste of Coke and Pepsi)
How much salt/sugar/spice could be added to a beverage before it is detected (see \(\text{LD}_{50}\))
Do more cars go east/west or north/south at intersection of 6th and 146th
I will not strictly require that you collect your own data. If you find a dataset that you wish to use and have an interesting research question, I will be happy to consider it with you. That being said, my expectations for analysis on a pre-curated dataset will necessarily be higher than for those who have gone through the study design process themselves. I will also not strictly disallow the use of surveys, but will discourage them. Survey data cleaning and manipulation in R is not easy – and as such it will require a lot of work on your end to get the data in a workable format to use R graphics and analysis.
Finally, there are often situations in which it may not be feasible to collect all of the data listed here (3+ vars, 30 observations). If you have an idea and are not sure on the practicality of it, feel free to talk to me and we can brainstorm solutions together.
To help encourage consistent progress in the project, intermediate documents will be submitted at various points between now and the rest of the semester. Together, these will make up a portion of your final project grade and they must be submitted on time for full credit. Late documents will still be accepted, however (and in fact are required for the successful completion of the project).
You group’s final report should be no more than three (3) pages including embedded figures and tables, generated in R Markdown. The report should include the following information:
This report will be assessed on both content and form: messages, warnings, and the code used to generate plots and test statistics should not be displayed, sections should be clearly labeled with headers, and the overall presentation of the document should look professional.
Recognize that keeping this report within the 3-page limit means you will have to plan and be deliberate in deciding what is important to include and what is not. Indeed, most statistical analyses do not report everything that was explored in the final write up.
Presentations will be limited to 10 minutes will be delivered in the last week of the course (not counting exam week). The presentation should clearly outline
Groups with multiple members should do their best to maintain an equitable distribution of speaking/presentation time