Instructor:
Class Meetings:
Office Hours:
This time is purposefully scheduled for you to drop in and ask any questions about the course. Please feel free to stop by. If the time doesn’t work for you, message me and we can try to arrange something.
Mentor Information:
This course employs the use of a mentor to aid you in navigating the course. Our course mentor will assist us in class, and host 1 – 2 mentor sessions throughout the week. Mentor Sessions may review course content, provide practice problems, practice interview skills, or provide homework help.
More mentor info will be posted.
Gradescope Course numbers
Course Description:
Welcome to the Fall 2025 sections of Grinnell College’s STA-230. This course introduces core topics in data science using R programming. This includes introductions to getting and cleaning data, data management, exploratory data analysis, reproducible research, and data visualization. This course incorporates case studies from multiple disciplines and emphasizes the importance of properly communicating statistical ideas. Prerequisite: MAT-209 or STA-209. Suggested CSC-151 or computer programming experience.
Texts:
No required texts, some free texts may be recommended throughout the semester.
There may also be readings or links from other sources which will be provided as necessary.
After completing this course, students will be able to do the following:
The current plan for the course is as follows:
You will have the freedom to choose your partner for the first lab, afterwords lab partners may be assigned. During labs it is essential that you and your partner work together, making certain that each of you understand your work equally well.
Most labs will begin with a brief “preamble” section that we will go through together as class. The purpose of this section is to introduce the topic of the lab and ensure a smooth start to each class meeting.
Course structuring is subject to change. I am basing our labs on previous semesters which used a Tuesday/Thursday schedule
This class will employ a mastery-based grading system. All Components will be graded on a Satisfactory (S) or not-Satisfactory (NS) scale. The course grading scheme being employed is primarily the result of Prof. William Rebelsky’s previous version STA-230.
We will have in class labs almost every class. In general these will be due 10pm on the Sunday immediately following the class period in which they were released. In order to receive a Satisfactory:
We will have approximately weekly homework assignments due on Fridays at 10pm. In order to receive a Satisfactory:
There will be 4 midterms:
There will be no final exam held for the class.
Your attendance and participation in class is an integral part of your learning. You are expected to attend every class and work respectfully and effectively with your assigned partner.
You may be excused from a class under certain situations. Excusable reasons to miss class include college sponsored sports absences, religious holidays, family emergencies, and illness. Please email me at least a week in advance in the event of a planned absence. In the event of an unplanned absence (e.g. illness), please let me know as soon as possible if you will miss class, ideally at least 30min in advance of the start of class. Excused absences will not count against tokens (see below) and will count as an S for the purposes of letter grades below.
This course will rely on the ideas of specifications grading and mastery grading. These systems, inspired by adult learning theory, are designed to create a “low-threat” learning environment where:
Note: I reserve the right to update requirements for grades as circumstances dictate over the course of the semester (e.g. if the number of assignments or labs changes).
Letter grades for the entire course will be assigned according to the bundles in the table below. You will receive the grade corresponding to the bundle for which you meet all the requirements. All bundles list minimum amounts, you may exceed the requirements for a bundle and still qualify for it. All numbers in the table are the minimum number of satisfactory grades achieved.
Grade | Attendance (41 Possible) | Labs (11 Possible) | Homework (8 Possible) | Midterms (4 Possible) |
---|---|---|---|---|
C | 32 | 7 | 5 | 2 |
B | 35 | 9 | 6 | 3 |
A | 38 | 10 | 7 | 4 |
D: 3 requirements of a C are met F: 0-2 requirements of a C are met Half letter grades (C+,B+): all of the lower tier (C/B) requirements met, two of the higher tier (B/A) non-essay requirements met. Half letter grades (B-,A-): all of the lower tier (C/B) requirements met, three of the higher tier (B/A) non-essay requirements met.
Later on, I will link a spreadsheet that you can use to test various combinations to see what the grade will be by the midsemester date.
One of the fundamental principles behind this grading scheme is that you will have opportunities to re-try assignments if they do not originally obtain a satisfactory grade. My goal in using this schema is to reduce the stress that accompanies typical grading rubrics and give you permission to make mistakes and learn as much as possible. Ultimately, my goal is for each student to learn as much as possible, and I would be very happy to have every student earn an A.
Tokens reflect that life inevitably rears its ugly head in
some fashion and ruins our best-laid plans. You begin the course with
3
tokens. Tokens may be used for:
There may be opportunities to earn more tokens as the semester progresses by reading select research papers and answering a short quiz.
Software is increasingly an essential component of statistics and
will play a role in this course. We will primarily use R
,
an open-source statistical software program.
You are welcome to use your own personal laptop, or a Grinnell
College laptop, during the course. R
is freely available
and you can download it and it’s UI companion, R Studio
,
here (note: R
must be downloaded and installed before
R Studio
):
R
from http://www.r-project.org/R Studio
from http://www.rstudio.com/You may also work on a classroom computer, all of which will have
R
and R Studio
pre-installed.
Finally, Grinnell hosts an online version of R Studio that you may use while on campus internet: https://rstudio.grinnell.edu/
R
for Data Science? (From Prof. Will
Rebelsky)If you’ve spent any time reading about data science online you’ll undoubtedly have noticed the prominence of the Python programming language. Indeed, research from Cal State University found Python was the most popular data science language in private industry, being mentioned in 42% of data scientist job postings. However, R, which was mentioned in 20% of job postings, is not far behind and offers a few advantages when approaching data science from a statistical perspective (hence this course having the STA prefix).
Both R and Python provide plenty of functions for data manipulation. However, because R was created by academic statisticians, it offers very strong data visualization and statistical modeling packages. On the other hand, Python is a general-purpose programming language that excels in production, deployment, and machine learning. Regardless of each language’s strengths and weaknesses, as an introductory course our focus is on the fundamental skills and thought processes used in data science – which is something that can be accomplished regardless of the tools used (which will change over time anyways).
You can expect to spend 12 hours per week on this course, including all in-class and out of class time. This number is based off of the Grinnell Guidelines for credit-hours. If you find that you are spending significantly more than 9 hours working on material for this course outside of class each week, please let me know.
Grinnell College’s Academic Honesty Policy is located in the online Student Handbook. It is the College’s expectation that students be aware of and meet the expectations expressed in this policy. In addition, in this course, it is my expectation that students may collaborate on the Homework Assignments and must collaborate on the Labs, however your collaboration must be attributed and all answers must be written up separately. It is my expectation that the Midterm will be completed independently.
In this course, you are not allowed to use solutions you find on the internet, and further, you are not allowed to search for problem solutions on the internet (this includes resources such as ChatGpt). I know that there is great temptation to look for solutions online when things get difficult. It is my hope that the format of this course eases some of the pressure that you might feel. Additionally, we will work to build our growth mindset in this course, which makes it less uncomfortable to sit with a challenging problem. For more information on the way I approach academic honesty, it may be helpful to check out Professor Samuel Rebelsky’s extended statement on academic honesty and integrity.
This Syllabus is based off material taken from a variety of Professors at Grinnell including, but not limited to, Professors (William) Rebelsky, Miller, and Nolte. Course content and organization is heavily based off previous courses by Profs. Miller and Rebelsky.