Final Project
- The final project for this class is to create a truly “reproducible report” on a topic of your choosing related to higher education
- The topic can really be almost anything of interest related to higher ed, so long as you can find public data to use
- Your report should be 3-5 pages including multiple graphs and visual elements (i.e., not too much text)
- Your goal is something like what you might hand a senior administrator at your university to summarize a trend/issue/topic
- You will likely only have a handful of citations
- You should devote around half your page space to data visualizations and tables
- Your goal is something like what you might hand a senior administrator at your university to summarize a trend/issue/topic
- The primary focus of this report is reproducibility
- Your data must be publicly available with no IRB restrictions, as you will not submit it, I will go and collect it (unless you download it as part of the project code)
This assignment should be submitted as a text entry directly on Canvas consisting of;
- A paragraph describing your project:
- What will you be investigating/exploring/predicting?
- Why is it interesting?
- A description of where you will find this data.
- A few lines describing your main outcome variable in detail
- How is it coded/what scale is it on?
- How can you interpret it?
This assignment is worth 5 points, full points will be awarded once satisfactorily completed, multiple re-submissions may be required.
This should be submitted to Canvas by the due date listed.
NOTE: For your initial analyses, the most important thing is that you submit code that sources/renders in full, this assignment will not be successfully completed until that happens, you may have to resubmit multiple times.
Submit the following in a Quarto (.qmd
) file and its rendered PDF:
- Text (in a Quarto script) that describe
- Where your data is from (a link is preferable)
- How to download it
- Where to save it in order for your code to run
- E.g., For this project I used two .csv data files from IPEDS survey year 2019, institutional characteristics HD2019 and public finance F1819_F1A. These can be downloaded from here by clicking on the named files name under “Data Files”. To run this code, save these files in a sub-folder called “data” that sits in the same folder as this .qmd file.
- Code that:
- Reads in the data set that contains (at least) your dependent variable (must read in EXACTLY what downloads from following your instructions above)
- If appropriate
- Converts missing values to
NA
- Reshapes the data wider or longer
- Joins in additional data files
- Converts missing values to
- Code that creates at least three of the following:
A plot that shows the overall distribution of your dependent/outcome variable
- Hint: A histogram or density plot might be a good option here
A plot that shows the distribution of your dependent/outcome variable grouped by a variable in your data
- Hint: A histogram or density plot with
fill
might be a good option here
- Hint: A histogram or density plot with
A plot that shows the median, interquartile range, and potential outliers of your dependent/outcome variable grouped by a variable in your data
- Hint: A box plot with
x
and/orfill
might be a good option here
- Hint: A box plot with
A plot that shows how your dependent/outcome variable changes by another continuous variable in your data
- Hint: A scatter plot might be a good option here
This assignment is worth 10 points, full points will be awarded once satisfactorily completed, multiple re-submissions may be required.
This should be submitted to Canvas by the due date listed.
- You will present the results of your report in class during the penultimate week of the semester (see date in Canvas)
- The presentation format is up to you, previous students have
- Presented an image of one figure they created
- Created a short PowerPoint presentation
- Created presentations using Quarto
- The primary rule for this presentation is that it is to be 3-5 mins long (read min 3 mins, max 5 mins, ideal 4 mins)
- As this is meant to replicate the you presenting your report to senior administrators, this is a hard time-limit, you will be stopped if you go over 5 mins
This assignment is worth 5 points, your grade will be determined by:
3 points: Did you present a plot/table/finding from your report that tells a story about your topic?
1 point: Did you present the information in a professional and engaging manner?
1 point: Did you finish within the allotted time limit of 3-5 mins?
Your final report is a 3-5 page (single-spaced, not including citations) document that summarizes the analysis you have done using plenty of figures and summary tables along the way
This is NOT a traditional academic paper, it is meant to be concise report intended for a university administrator or policymaker
5 pages is a hard limit, anything beyond the 5th page won’t be graded
NOTE: You can submit a draft (for feedback only) by the due date on Canvas
You will submit the report as Quarto (
.qmd
) fileThe output
format:
should be eitherdocx
,pdf
(traditional way usingLaTeX
), ortypst
(brand new way to create a.pdf
)Documentations for different outputs here:
Optional: If your analysis code becomes long, you might want to submit accompanying
.R
scripts that aresource()
-ed as discussed in the Quarto Lesson
Required Report Content
Before the introduction of your document, a section called “Instructions to Run” that states
- Where your data is from (a link if preferable)
- How to download it (and if you used any automatic data retrieval)
- Where to save it in order for the code to run
- E.g., For this project I used two
.csv
data files from IPEDS survey year 2019, institutional characteristics HD2019 and public finance F1819_F1A. These can be downloaded from here by clicking on the named files name under “Data Files”. To run this code, save these files in a sub-folder called “data” that sits in the same folder as this.qmd
file.
Well commented code (either in the
.qmd
file or asource()
-ed.R
script) that:- Reads in all your raw data (must read in EXACTLY what downloads from following your instructions above)
- Performs all data wrangling tasks to clean, join, and reshape your data as necessary for your project
- Creates:
- Required: 3 or more plots with
ggplot2
- Required: 1 or more overall descriptive statistics table(s) made with
summarize
- Required: 1 or more other summary table(s) made with
summarize
- Optional: Basic statistics like
t.test()
orlm()
- Required: 3 or more plots with
Written text that should be clearly structured with subheadings and describe:
- Why this is interesting and/or important
- This should be a single concise but convincing paragraph, think an “elevator pitch” argument as to why this matters
- There should NOT be any lengthy literature review in this assignment, this is it
- Why your chose your data source and what the data represents
- What analysis you did and why, in layman’s terms (not an R or stats expert)
- What each individual plot and table shows
- What you found overall
- Any limitations or future research
- Why this is interesting and/or important
Rubric
Report Element | Criteria | Points |
---|---|---|
Does it run? |
|
4 |
Data Wrangling: Reading & Cleaning |
|
4 |
Data Wrangling: Analysis |
|
4 |
Data Visualization |
|
4 |
Written Content |
|
4 |