Final Project

The final project for this class is to create a truly “reproducible report” on a topic of your choosing related to higher education
- The topic can really be almost anything of interest related to higher ed, so long as you can find public data to use
Your report should be 3-5 pages including multiple graphs and visual elements (i.e., not too much text)
- Your goal is something like what you might hand a senior administrator at your university to summarize a trend/issue/topic
  - You will likely only have a handful of citations
  - You should devote around half your page space to data visualizations and tables
The primary focus of this report is reproducibility
- Your data must be publicly available with no IRB restrictions, as you will not submit it, I will go and collect it (unless you download it as part of the project code)

This assignment should be submitted as a text entry directly on Canvas consisting of;

A paragraph describing your project:
- What will you be investigating/exploring/predicting?
- Why is it interesting?
A description of where you will find this data.
A few lines describing your main outcome variable in detail
- How is it coded/what scale is it on?
- How can you interpret it?

This assignment is worth 5 points, full points will be awarded once satisfactorily completed, multiple re-submissions may be required.

This should be submitted to Canvas by the due date listed.

NOTE: For your initial analyses, the most important thing is that you submit code that sources/renders in full, this assignment will not be successfully completed until that happens, you may have to resubmit multiple times.

Submit the following in a Quarto (.qmd) file and its rendered PDF:

Text (in a Quarto script) that describe
- Where your data is from (a link is preferable)
- How to download it
- Where to save it in order for your code to run
- E.g., For this project I used two .csv data files from IPEDS survey year 2019, institutional characteristics HD2019 and public finance F1819_F1A. These can be downloaded from here by clicking on the named files name under “Data Files”. To run this code, save these files in a sub-folder called “data” that sits in the same folder as this .qmd file.
Code that:
- Reads in the data set that contains (at least) your dependent variable (must read in EXACTLY what downloads from following your instructions above)
- If appropriate
  - Converts missing values to NA
  - Reshapes the data wider or longer
  - Joins in additional data files
Code that creates at least three of the following:
- A plot that shows the overall distribution of your dependent/outcome variable
  - Hint: A histogram or density plot might be a good option here
- A plot that shows the distribution of your dependent/outcome variable grouped by a variable in your data
  - Hint: A histogram or density plot with fill might be a good option here
- A plot that shows the median, interquartile range, and potential outliers of your dependent/outcome variable grouped by a variable in your data
  - Hint: A box plot with x and/or fill might be a good option here
- A plot that shows how your dependent/outcome variable changes by another continuous variable in your data
  - Hint: A scatter plot might be a good option here

This assignment is worth 10 points, full points will be awarded once satisfactorily completed, multiple re-submissions may be required.

This should be submitted to Canvas by the due date listed.

You will present the results of your report in class during the penultimate week of the semester (see date in Canvas)
The presentation format is up to you, previous students have
- Presented an image of one figure they created
- Created a short PowerPoint presentation
- Created presentations using Quarto
The primary rule for this presentation is that it is to be 3-5 mins long (read min 3 mins, max 5 mins, ideal 4 mins)
- As this is meant to replicate the you presenting your report to senior administrators, this is a hard time-limit, you will be stopped if you go over 5 mins

This assignment is worth 5 points, your grade will be determined by:

3 points: Did you present a plot/table/finding from your report that tells a story about your topic?
1 point: Did you present the information in a professional and engaging manner?
1 point: Did you finish within the allotted time limit of 3-5 mins?

Your final report is a 3-5 page (single-spaced, not including citations) document that summarizes the analysis you have done using plenty of figures and summary tables along the way
- This is NOT a traditional academic paper, it is meant to be concise report intended for a university administrator or policymaker
- 5 pages is a hard limit, anything beyond the 5th page won’t be graded
- NOTE: You can submit a draft (for feedback only) by the due date on Canvas
You will submit the report as Quarto (.qmd) file
- The output format: should be either docx, pdf (traditional way using LaTeX), or typst (brand new way to create a .pdf)
  - Documentations for different outputs here:
- Optional: If your analysis code becomes long, you might want to submit accompanying .R scripts that are source()-ed as discussed in the Quarto Lesson

Required Report Content

Before the introduction of your document, a section called “Instructions to Run” that states
- Where your data is from (a link if preferable)
- How to download it (and if you used any automatic data retrieval)
- Where to save it in order for the code to run
- E.g., For this project I used two .csv data files from IPEDS survey year 2019, institutional characteristics HD2019 and public finance F1819_F1A. These can be downloaded from here by clicking on the named files name under “Data Files”. To run this code, save these files in a sub-folder called “data” that sits in the same folder as this .qmd file.
Well commented code (either in the .qmd file or a source()-ed .R script) that:
- Reads in all your raw data (must read in EXACTLY what downloads from following your instructions above)
- Performs all data wrangling tasks to clean, join, and reshape your data as necessary for your project
- Creates:
  - Required: 3 or more plots with ggplot2
  - Required: 1 or more overall descriptive statistics table(s) made with summarize
  - Required: 1 or more other summary table(s) made with summarize
  - Optional: Basic statistics like t.test() or lm()
Written text that should be clearly structured with subheadings and describe:
- Why this is interesting and/or important
  - This should be a single concise but convincing paragraph, think an “elevator pitch” argument as to why this matters
  - There should NOT be any lengthy literature review in this assignment, this is it
- Why your chose your data source and what the data represents
- What analysis you did and why, in layman’s terms (not an R or stats expert)
- What each individual plot and table shows
- What you found overall
- Any limitations or future research

Rubric

Report Element	Criteria	Points
Does it run?	If I hit render, does the code run and produce the output report without errors?	4
Data Wrangling: Reading & Cleaning	Are the “instructions to run” provided and correct? Is the data read in correctly? Is the data joined and/or reshaped correctly where necessary? Is the data cleaned where necessary with reasonable justification provided for subjective decisions?	4
Data Wrangling: Analysis	Is there at least 1 overall descriptive statistics `summarize` table in the report? Is there at least 1 additional `summarize` table in the report? Do the summary tables show interesting and relevant information to the report topic? Is the code to produce the summary tables free of mistakes?	4
Data Visualization	Are there at least 3 `ggplot2` plots in the report? Do the plots show interesting and relevant information to the report topic? Are the plots aesthetically pleasing with good plot design, color choices, and labels? Is the code to produce the summary tables free of mistakes?	4
Written Content	Does the written content address the points outlined above? Does the report make a convincing argument? Is the text of the report generally well written and in layman’s terms? Is the text of the report well structured with subheadings?	4