Final Project
- The final project for this class is to create a truly “reproducible report” on a topic of your choosing related to higher education
- The topic can really be almost anything of interest related to higher ed, so long as you can find public data to use
- Your report should be 3-5 pages including multiple graphs and visual elements (i.e., not too much text)
- Your goal is something like what you might hand a senior administrator at your university to summarize a trend/issue/topic
- You will likely only have a handful of citations
- You should devote around half your page space to data visualizations and tables
- Your goal is something like what you might hand a senior administrator at your university to summarize a trend/issue/topic
- The primary focus of this report is reproducibility
- Your data must be publicly available with no IRB restrictions, as you will not submit it, I will go and collect it (unless you download it as part of the project code)
This assignment should be submitted as a text entry directly on Canvas consisting of;
- A paragraph describing your project:
- What will you be investigating/exploring/predicting?
- Why is it interesting?
- A description of where you will find this data.
- A few lines describing your main outcome variable in detail
- How is it coded/what scale is it on?
- How can you interpret it?
This assignment is worth 5 points, full points will be awarded once satisfactorily completed, multiple re-submissions may be required.
This should be submitted to Canvas by the due date listed.
NOTE: For your initial analyses, the most important thing is that you submit code that sources/renders in full, this assignment will not be successfully completed until that happens, you may have to resubmit multiple times.
Submit the following in either a cleanly formatted R (.R
) script or Quarto (.qmd
) file:
- Comments (if an R script) or text (if a Quarto script) that describe
- Where your data is from (a link is preferable)
- How to download it
- Where to save it in order for your code to run
- E.g., For this project I used two .csv data files from IPEDS survey year 2019, institutional characteristics HD2019 and public finance F1819_F1A. These can be downloaded from here by clicking on the named files name under “Data Files”. To run this code, save these files in a sub-folder called “data” that sits in the same folder as this .qmd file.
- Code that:
- Reads in the data set that contains (at least) your dependent variable (must read in EXACTLY what downloads from following your instructions above)
- If appropriate
- Converts missing values to
NA
- Reshapes the data wider or longer
- Joins in additional data files
- Converts missing values to
- Code that creates at least three of the following:
A plot that shows the overall distribution of your dependent/outcome variable
- Hint: A histogram or density plot might be a good option here
A plot that shows the distribution of your dependent/outcome variable grouped by a variable in your data
- Hint: A histogram or density plot with
fill
might be a good option here
- Hint: A histogram or density plot with
A plot that shows the median, interquartile range, and potential outliers of your dependent/outcome variable grouped by a variable in your data
- Hint: A box plot with
x
and/orfill
might be a good option here
- Hint: A box plot with
A plot that shows how your dependent/outcome variable changes by another continuous variable in your data
- Hint: A scatter plot might be a good option here
This assignment is worth 10 points, full points will be awarded once satisfactorily completed, multiple re-submissions may be required.
This should be submitted to Canvas by the due date listed.
- You will present the results of your report in class during the penultimate week of the semester (see date in Canvas)
- The presentation format is up to you, previous students have
- Presented an image of one figure they created
- Created a short PowerPoint presentation
- Created presentations using Quarto
- The primary rule for this presentation is that it is to be 3-5 mins long (read min 3 mins, max 5 mins, ideal 4 mins)
- As this is meant to replicate the you presenting your report to senior administrators, this is a hard time-limit, you will be stopped if you go over 5 mins
This assignment is worth 5 points, your grade will be determined by:
3 points: Did you present a plot/table/finding from your report that tells a story about your topic?
1 point: Did you present the information in a professional and engaging manner?
1 point: Did you finish within the allotted time limit of 3-5 mins?
Your final report is a 3-5 page (single-spaced, not including citations) document that summarizes the analysis you have done using plenty of figures and summary tables along the way
This is NOT a traditional academic paper, it is meant to be concise report intended for a university administrator or policymaker
5 pages is a hard limit, anything beyond the 5th page won’t be graded
NOTE: You can submit a draft (for feedback only) by the due date on Canvas
You will submit the report as Quarto (
.qmd
) fileThe output
format:
should be eitherdocx
,pdf
(traditional way usingLaTeX
), ortypst
(brand new way to create a.pdf
)I would strongly recommend
docx
for most studentsIf you’re feeling more adventurous, have a go with
typst
You’ll need Quarto 1.4, which came out after the start of the class, see me if you need help installing it
If you want to cause yourself unnecessary misery and frustration by doing it the old fashioned way, use
pdf
Optional: If your analysis code becomes long, you might want to submit accompanying
.R
scripts that aresource()
-ed as discussed in the Quarto Lesson
Required Report Content
Before the introduction of your document, a section called “Instructions to Run” that states
- Where your data is from (a link if preferable)
- How to download it
- Where to save it in order for the code to run
- E.g., For this project I used two
.csv
data files from IPEDS survey year 2019, institutional characteristics HD2019 and public finance F1819_F1A. These can be downloaded from here by clicking on the named files name under “Data Files”. To run this code, save these files in a sub-folder called “data” that sits in the same folder as this.qmd
file.
Well commented code (either in the
.qmd
file or asource()
-ed.R
script) that:- Reads in all your raw data (must read in EXACTLY what downloads from following your instructions above)
- Performs all data wrangling tasks to clean, join, and reshape your data as necessary for your project
- Creates:
- Required: 3 or more plots with
ggplot2
- Required: 1 or more overall descriptive statistics table(s) made with
summarize
- Required: 1 or more other summary table(s) made with
summarize
- Optional: Basic statistics like
t.test()
orlm()
- Required: 3 or more plots with
Written text that should be clearly structured with subheadings and describe:
- Why this is interesting and/or important
- This should be a single concise but convincing paragraph, think an “elevator pitch” argument as to why this matters
- There should NOT be any lengthy literature review in this assignment, this is it
- Why your chose your data source and what the data represents
- What analysis you did and why, in layman’s terms (not an R or stats expert)
- What each individual plot and table shows
- What you found overall
- Any limitations or future research
- Why this is interesting and/or important
Rubric
Report Element | Criteria | Points |
---|---|---|
Does it run? |
|
4 |
Data Wrangling: Reading & Cleaning |
|
4 |
Data Wrangling: Analysis |
|
4 |
Data Visualization |
|
4 |
Written Content |
|
4 |