library(tidyverse)
library(haven)
<- read_dta("data/hsls-small.dta")
data
ggplot(data) +
geom_histogram(aes(x = x1txmtscor))
Reproducable Reports with Quarto
- You have been using Quarto for your assignments since the start class
- So, hopefully you’re getting somewhat comfortable with the way they mix code and text together
- In this lesson, we are going to learn a bit more about how to use Quarto including some of its powerful features
What Is Quarto and What Can It Do?
Quarto vs RMarkdown
- Some of you may have heard of or used RMarkdown documents before
- In short, Quarto is the new version of RMarkdown
- See this document for more details
- The reason for the change is a little technical
- RMarkdown is (as you might guess) tied very closely to R ‘under the hood’
- Quarto is free to work with a variety of languages
- What this means is Quarto can do everything RMarkdown can do plus A LOT more
- For example, Quarto has a much larger variety of outputs
- 99% of anything that works in RMarkdown will work exactly the same in Quarto, but the same isn’t true in reverse
- One very nice difference between the two is a Quarto has an extremely comprehensive documentation site
- Fun fact, the Quarto references website is itself a Quarto project, which is kind of meta!
Quarto vs Microsoft Word
Up until now, you’ve likely written most of your assignments using a Microsoft Word or something similar
There are a few major reasons Quarto has most replaced Word for 95% of my work
- Automatically created tables/results from code
- This is the big one, as you’ll see later in the lesson Quarto allows us to include tables and figures that are automatically created
- No hours spent copying results into a table by hand (and possibly making a mistake)
- You can even have your text update automatically if you want
Quick Discussion
In addition to not having to type out the table in the first place, a big pro of that any changes you make in your analysis (e.g., cleaning a variable differently, including a new variable in the analysis)
What kind of real-world situations do you think would help in?
Reproducibility & transparency - By writing your document in Quarto, you see where everything in your paper comes from - This makes it much easier for someone looking to reproduce your work and/or see where you got a result from
- You should always share your analysis code, even for stuff written in Word, but writing in Quarto makes everything even more transparent
- We can see the code/source of any table or figure in the document, it’s much harder to fake/hide something
Integration with Zotero for citations - We will look at this towards the end of the lesson, but, it’s a great feature!
Version control with git - One of your (more challenging) extra-credit opportunitites is to use git, which is a way to track changes in your projects - Working with git basically allows you to keep a record of all the changes you save
- This means there’s no longer a need for “dissertation.docx”, “dissertation_revised.docx”, “dissertation_revised3.docx,”dissertation_final.docx”, dissertation_FINAL_revised4.docx”
- This works much more effectively with Quarto files as they’re just plain text, which means you can see the changes, something you can’t with word
- This means there’s no longer a need for “dissertation.docx”, “dissertation_revised.docx”, “dissertation_revised3.docx,”dissertation_final.docx”, dissertation_FINAL_revised4.docx”
One document, many formats! - Again, we will look at some of these in just a little bit!
Basics of Writing in Quarto
To get started, let’s all open a totally blank Quarto file that we can build up over the lesson
In RStudio go to “File” -> “New File” -> “Quarto Document” -> “Create Empty Document” (bottom left corner)
Writing Markdown
- Markdown is a way of writing text with simple formatting options
- For normal text,
just write text
- For new paragraphs, leave a blank line between text
- For bold text,
**use two astrisks around the text**
- For italic text,
_use underscores around the text_
or*use one astrisk around the text*
- For headings, use # before the text for each level, e.g.
# level 1 headings
to##### level 5 headings
- For links,
[Type the text you want to display in square brackets](Type the link in parantheses)
- For images,
Type an data:image/s3,"s3://crabby-images/2f391/2f39129cb5d72744654f50b37855b2e8038960a8" alt="Type the caption in square brackets"
- For lists, start a new line then use
1.
for numbered lists or-
for bulleted lists - For tables, well… There are better ways than typing out a markdown table by hand
- For normal text,
Visual Editor
- Now, lucikly for you, Quarto & RStudio now have a visual editor, which makes it feel like writing in Word
- If you look at the top of the document you will see “Source” and “Visual”
- If you toggle into “Visual” two you will see options appear at the top like B and I make it easy to change the text without writing the ** or _
- It’s not that one way is better than the other, I use both depending on the situation/task
- Typically, if I’m writing more code chunks, I will be in the source mode
- If I’m writing longer sections of text, I will be in the visual mode (particularly for entering citations, which we will talk about later)
Quick Excercise
Using either source or visual mode add the following to your blank Quarto file
- Add a level-one heading “Data”
- Write a sentance or two about the data you’re using for you final project
- Make the source of your data (e.g., IPEDS) italic
Header Options
- At the top of your Quarto document you will see two sets of
---
which is what we call the header- This where we can control a lot of options that affect the document
- There are so, so, so, many options, many of which change with the format you are using
- Some basic ones to be aware of for now are
title:
kind of self-explanatoryauthor:
again, kind of self-explanatorydate:
this can take a specific date, or, “today” will use the current date when you render the documentformat
: this is power of Quarto, there are so many options which will be discussed nextexecute:
these are the default way we want code chunks to outputecho: FALSE
means don’t print out the code (e.g., this website and your assignments useecho: TRUE
but your final report should useecho: FALSE
)- There’s a bunch more of these options, setting them in the header makes that the default behavior for the document, you can set them at the chunk level to only apply to that chunk
- If you want to learn more, see executation options documentation
Quick Exercise
- Change the title to something appropriate for your final report
- Change the author to your name
- Change the date so that it uses the current date whenever it is rendered
Output Formats
- This is where Quarto really shines!
- No matter what format you want your work to be in, using Quarto you always write it using the same basic input
- You can even publish something in multiple formats simultaniously, such as a webpage with a formatted pdf copy
- This can be really useful for quickly reducing a paper into a presentation for a conference, for example
- Again, these will all have the advantage of automatically updating/correcting tables and figures
format: pdf
- So far in the class, we have had you use
format: pdf
as our output format - This renders your document using something called LaTeX, a traditional academic pdf creation tool
- LaTeX is widely used in acadmia, but, it’s not the friendliest thing to work with
- For instance, if you want a figure/table a certain place and LaTeX doesn’t think it should go there, you have a fight on your hands
- LaTeX is a tool you can use directly to write academic papers too, UF dissertation office and many journal articles accept LaTeX submissions
- Quarto (and Rmd) write the LaTeX code for you behind the scenes, making in much easier
- If you’re interested in seeing what this looks like, add
keep-tex: true
to your YAML header
- LaTeX is widely used in acadmia, but, it’s not the friendliest thing to work with
- You can see details about LaTeX options on the Quarto documentation for
LaTeX
output - As an aside, I am writing my dissertation in Quarto, having adapted UF’s LaTeX template to work with Quarto input
- I have created a public template for that which I will continue to update as I go, it should be ready by the time you are at that stage if you decide you like Quarto as much as I do!
Quick Excercise
- Add what’s required to make your blank document render to a pdf (using LaTeX)
format: typst
- Another way of creating a pdf that came out last year is typst
- Similarly to LaTeX this is something you could use directly, but, Quarto can convert our markdown and tables to it automatically, which is much easier
- There are pros and cons of typst vs LaTeX for creating pdfs
- Pros
- It’s MUCH faster, particularly once your LaTeX document has multiple figures in it, it can take minutes to render on an older/slower computer, typst is closer to instant
- When you want to do something (particularly advanced formatting) beyond Quarto and Markdown do, adding in raw typst code is MUCH easier than adding raw LaTeX
- Cons
- New and not widely accepted, if you’re submtting the pdf itself, you’re fine, but academic journals for instance want either a docx or raw LaTeX code so they can edit the format, only a small number of journals (and none in higher ed) accept raw Typst code (yet)
- While it’s easier to use, and very verstile, it is not quite as customizable as LaTeX, particularly for meeting extra-strict formatting guidelines (like a dissertation)
- You can see details about typst options on the Quarto documentation for
typst
output - As another aside, I’ve used typst to create a wide range of pretty looking docuemnts from my resume to a website of recipe cards
Quick Excercise
- Add what’s required to make your blank document render to a pdf (using typst)
format: html
html
is the file format that web browsers read for almost every website you visit- You can customize html format to almost anything you want and add fancy content like interactive graphics
- Unfortunately, the majority of work in academia (for now at least) still relies on traditional paper-based document formats, so we can’t justify spending much time on this today
- If you want to learn more about
html
output, start with the Quarto documentation on Quarto Websites and Reveal.js Presentations - You can see details about HTML options on the Quarto documentation for
.html
output - As yet another aside, this entire website is built with Quarto as html output
- Click on the little GitHub icon in the lower right corner of the website to see to
.qmd
files
- Click on the little GitHub icon in the lower right corner of the website to see to
Quick Excercise
- Add what’s required to make your blank document render to an HTML web page
format: docx
& format: pptx
- The most flexible and compatible option is using Microsoft Office formats (
.docx
for Word and.pptx
for PowerPoint) - When you render your Quarto document, the result is a file that you can then open in Microsoft Office or send to a supervisor/advisor
- For example, if I’m running the data analysis for a project, I can create all the plots and tables, write up the methods section, then pass to my advisor to fill in the literature sections
- Sure, straight to
.pdf
is nice is everyone is working in Quarto, but often (sadly) they won’t be.docx
is a just more practical option a lot of the time
- You still get a document rendered with all the tables and plots which will update if the data changes upstream
- Sure, straight to
- For example, if I’m running the data analysis for a project, I can create all the plots and tables, write up the methods section, then pass to my advisor to fill in the literature sections
- Plus, in the real world, once you’ve run your analysis in R/Quarto and created the tables/plot, it can be nice to have the option to run Grammarly on it in word, or tweak one little layout feature you can’t figure out how to get right in Quarto
- There are unfortunately some things that just don’t work that well
- Many formatting things are hard to get write, but, the upside is they’re usually fixable in Word after you render the document
- One thing that can be extremely frustrating is that to actually overwrite the Word document, you need to
- Not have the document open in Word, as Microsoft won’t let Quarto save over an open document
- Go to background jobs and hit the stop sign icon before rendering
- You see details about Word options on the Quarto documentation for
.docx
output - As a final aside (for now), you can customize the output format using the
reference-doc:
option- These take a some learning to know how to write, but it basically involves setting up styles that correspond to styles Quarto outputs your text as
- In case they’re useful, I have attached a couple of styles I have create and used in the past
Quick Excercise
- Add what’s required to make your blank document render to an Word document
- Since you’ll be submitting a .pdf for your final report, change it back to make a pdf
- For those whose computer it takes more than a few seconds to render using LaTeX, I’d recommend typst, it will save you hours when editing your final report
Using R to Create Figures & Tables in Quarto
- The text is all well and good, but, the real power of Quarto comes from the ability to integrate the output of code into the document
- The way to this is simply adding R code that creates a table or figure into the report
- Just like you’ve been doing in your homework assignments so far
- These are like you cut an
.R
script up into little pieces and put them in between text- They still run like one continuous
.R
script, from top to bottom- Anything you
<-
assign in an earlier chunk will be available in later chunks
- Anything you
- The results/output of code in a chunk will print where the chunk is in the text in the rendered document
- You can add a code chunk by clicking on the insert code button along the top, or, by keyboard shortcut
- command + option + I on Mac
- ctrl + alt + I on PC
- Important
- Quarto documents always start from a fresg R environment everytime you render them
- Quarto will not be able to access any data you have read in or objects you have created outside the document
- Any data you want to use in your code chunks needs to be read in an earlier code chunk
- You will always need to use a relative path from where the .qmd file is saved
- Same goes for any packages you want to use
- Quarto documents always start from a fresg R environment everytime you render them
- They still run like one continuous
Including Plots
You already have practice with this from the Data Visualization I & II assignments, so it should be pretty familiar
For practice, let’s read in our familiar hsls data and create a plot
Let’s start out by reading in data, then making a simple ggplot of math test scores
- Since we want to see the results, we will just call it rather than assigning it to an object
- Note: We need to load packages in our quarto script every time
- Hint: If your output is appearing in the Quarto document but you’d like to see it in the plots box like you would in a R script
- Click on the cog/gear icon to right of the Render button
- Select “Chunk Output in Console”
Quick Excercise
As a review of data visualization lessons
- Make it so we can see difference in math score by gender
- Hint: Since we are using labeled .dta data, as_factor(x1sex) will create the labels automatically
- Add appropriate x-axis and legend labels
- Change to theme to your favorite
- Render the document to see your pdf including the plot
Including Plots as an Image
- Now, you may notice the plot looks a little different than in RStudio
- Sometimes they don’t look quite as nice, particularly the layout can get cramped up
- Also sometimes you might create an image, plot, or figure somewhere else that you want to include
- To get around these issues, we can include images from a picture file
- First, let’s create an image
ggsave()
is a helpful function that simply saves the last plot you made- It only needs one argument, what you want to call the plot
- You can also specify
- height or width (good for complicated plots you want to save as a large image)
- bg for background, by default a .png will have no background, but add
bg = "white"
will put it on a white squre
- You can also specify
- For now, let’s just save the plot we just made as “math-scores.png”
- As we don’t specify any folders, it will just save it in our working directory
ggsave("math-scores.png")
To include an image (created in R or elsewhere) is always the same
In source mode it’s
data:image/s3,"s3://crabby-images/721d7/721d769d6fb77bb8f401bbe4ec07e7ba09ec6d32" alt="Caption"
In visual mode just click the image icon and select the file you want to add
Quick Excercise
- Save the math score you made in the previous excercise
- Add the plot to your Quarto document as an image
- Render to get the pdf
- See if there are any differences between the plot you saved as an image and the one you inlcuded just through code
Including Tables
- There are multiple ways to create and include tables in your Quarto documents, but are going to focus on 3 (really 2)
Hand-Written Markdown Tables
- Long story short, markdown tables are a pain to write by hand
- The visual editor table tool makes this MUCH easier
- Even then, you lose some of the advantages for using Quarto
- If it’s your results table and you hand write all the numbers, they won’t autocorrect if something changes upstream
- There are limited times I would recommend using these (I believe the class syllabus is one, but I’d probably even change that if I wrote it today)
- All you need to know is that it’s an option and the easiest way is to create it using the visual editor
kable()
-Written Markdown Tables
- An infinitely better solution for many reasons is using the
kable()
function from theknitr
library- You most likely all have this, but, just in case let’s reinstall
knitr
- You most likely all have this, but, just in case let’s reinstall
install.packages("knitr")
- So far in this class, there have been plenty of times we have used tables to answer questions, when we have had them print out the console, e.g.
<- data |>
data drop_na(x1txmtscor)
|>
data summarize(mean(x1txmtscor))
# A tibble: 1 × 1
`mean(x1txmtscor)`
<dbl>
1 51.1
- This is kind of a table, but, when we render our Quarto document it doesn’t look great
- Luckily, the fix is really easy
- First, we load the
knitr
package - Second, we just pipe our output into
kable()
- First, we load the
library(knitr)
|>
data summarize(mean(x1txmtscor)) |>
kable()
mean(x1txmtscor) |
---|
51.10957 |
- If you just run thisyou will see a bunch of
|
s and-
s (assuming you are using the “Chunk Output in Console” option)- That is markdown table
- Imagine writing that by hand for a medium to large size table…
- That is markdown table
kable()
is relatively basic in terms of customization, but, it can do most things you’re going to need- FYI: The package
kableExtra
has a lot more advanced options for kable customization
- FYI: The package
- For now, we will address the two most obvious issues with this table, the column names and the rounding (or lack thereof)
- We just need to add two simple arguments to our
kable()
- We just need to add two simple arguments to our
|>
data summarize(mean(x1txmtscor)) |>
kable(col.names = c("Mean of Math Score"),
digits = 2)
Mean of Math Score |
---|
51.11 |
kable()
will turn any data you pass to it into a table, let’s make a slightly more interesting summary table- Notice, we will use
as_factor()
to get the labels to show upfactor()
allows us to make a factor and apply our own labelsas_factor()
works withhaven
labeled data and gets the labels out
- Notice, we will use
|>
data group_by(as_factor(x1region)) |>
summarize(mean = mean(x1txmtscor),
median = median(x1txmtscor),
min = min(x1txmtscor),
max = max(x1txmtscor)) |>
kable(col.names = c("Region", "Mean", "Median", "Min", "Max"),
digits = 2)
Region | Mean | Median | Min | Max |
---|---|---|---|---|
Northeast | 52.17 | 51.99 | 24.9468 | 82.1876 |
Midwest | 51.23 | 51.07 | 24.0999 | 82.1876 |
South | 51.04 | 50.95 | 24.0180 | 82.1876 |
West | 50.15 | 49.95 | 24.0744 | 82.1876 |
- Much better!
Quick Excercise
- Create a similar table, but make the break it down by both region and sex
- Render the Quarto document
- Think is there anything else you’d want to have on this if it’s a research report?
Figure & Table Numbering & Captioning
- First, I want to note this is a little confusing at first, and you necssarily need to do this in your final reports (but it’s a great habit to get into)
- In some of the data viz lessons we added something like
title = "Math Test Scores by Sex
to our ggplot’s- You might expect to just change that to
title = "Figure 1: Math Test Scores by Sex
- That’s fine and works great if you’re creating the figure to use later in Word, but there’s a much better way to do it if you’re using Quarto
- You might expect to just change that to
- If you’re writing a long complicated document (cough dissertation cough) you might have 10, 20, 30 tables and/or figures
- Wouldn’t it be really nice if when you add a table or figure, all the numbers updated automatically?
- Quarto’s way ahead of you!
- You may have noticed in your assignemnet templates the code chunks always start with something like
#| label: 1a
- This is basically giving the code chunk a name, and it’s the basis of creating automatic table/figure numberin
- All you need to do is specify two things about the chunk
- You need to tell Quarto that this chunk is a table or figure
- You use the
#| label:
option to do this starting the label withfig-
for figures andtbl-
for tables- For example
#| label: fig-math
for our plot
- For example
- You need to tell Quarto what to use as the caption
- This is
#| fig-cap:
for figures or#| tbl-cap
for tables- For example
#| fig-cap: Math Test Scores by Sex
- For example
- If you do those things correctly, Quarto will know that chunk creates a plot and number it for you like below
data:image/s3,"s3://crabby-images/a0d44/a0d446e246f0bb407d9b04efe6e0bb4e78c52205" alt=""
Quick Excercise
- Copy the code chunk where you made a table of math scores by sex from above
- Add the
#| label
and#| fig-cap
as described above- Render the report and see if your figure was automatically numbered
- This is tricky, so let’s do it one more time with a table (the process is very similar)
Quick Excercise
- Copy the code chunk where you made a plot of math score descriptive statistics by region and sex
- Add an appropriate
#| label
and#| tbl-cap
- Render the report and see if your table was automatically numbered
- If you added a figure as an image file it’s even simpler as you already have a caption
- You just turn this
data:image/s3,"s3://crabby-images/721d7/721d769d6fb77bb8f401bbe4ec07e7ba09ec6d32" alt="Caption"
intodata:image/s3,"s3://crabby-images/721d7/721d769d6fb77bb8f401bbe4ec07e7ba09ec6d32" alt="Caption"{#fig-math}
- This will seamlessly integrate with the numbering of your chunk-created figures too
- Last thing on this
- Now, I mentioned how annoying it can be to update the figure and table numbers if something changes and you’ve written them all by hand
- Now think about everytime you say something like “In Figure 1 I show that…”
- How frustrating would it be to find and edit all of those?
- Luckily, if you’ve used Quarto’s automatic numbering system, you can leverage that here too
- Instead of writing in the text “In Figure 1” write
In @fig-math
(the label of chunk where the figure is made)- This will not only update automatically if the numbering changes but also add a convinient hyperlink to the table or figure
- It may seem like overkill for a simple report like you will be writing at the end of this class
- As you get into writing more complicated documents like research papers, especially something as long as your dissertation, you will be glad you got into the habit early
- Beyond the scope of this class, it also helps automate the process of creating lists of tables and figures, such as those required in dissertation formatting
Automatic Citations & Zotero
- What’s one of the most annoying things about academic writing?…
- That’s right, citation-styling!
- We just saw how Quarto can automatically handle numbering our figures and tables, it can also handle this for us!
- The easiest way to do this is using Zotero, so, let’s first get that set up
About Zotero
- Zotero is a free and open-source (just like R) citation management tool
- Basically, it’s a library where you can save, organize, annotate, and categorize articles, books, webpages, and anything else you might want to cite
- The earlier you start using a citation manager like Zotero, the more effective they will be
- As you’re writing class papers you should (hopefully) be starting to write about stuff that aligns with your eventual research interests
- If you save these all in Zotero as you go, by the time you get to your qualifying exams, guess who already has a library of 100+ sources ready to cite? You!
Installing Zotero
- Ideally you all installed Zotero before class, but in case not, download it here
- You’ll see there are two parts to Zotero, the program that runs on your computer, and the browser add on that makes saving citations so easy
- First, install the desktop program
- Second, get the browser connector set up
- For Chrome and Edge users you’ll see the “Install Chrome/Edge Connector” just click that and follow the instructions
- For Safari the connector is bundled with the desktop program, but we have to turn it on
- Open Zotero for the first time
- In the top left hit “Safari” -> “Settings” -> “Extensions”
- Enable the Safari Zotero Connector
Saving an Article into Zotero
- Zotero is basically a library for you to store all your citations
- Items saved in Zotero consist of
- Information about the article (title, author, date, url, doi)
- Attachments, most often a pdf of the article
- When you save an item with the Zotero connector it will always pull in as much information as it can
- This is most effective for academic journals, but it will try for webpages
- You can always add/edit this later
- It will also try to automatically save a pdf of the article for you, but, this only work if you have access (are on the UF network or VPN), and even then it sometimes fails
- Again, you can simply drag and drop the pdf in place later if needs be
- Items saved in Zotero consist of
- So, let’s save our first article to Zotero
- Since it’s tax season, let’s go to a classic article I read my first semester of the Ph.D. and is probably the subconscious reason I use H&R Block to file taxes…
- The Role of Application Assistance and Information in College Decisions: Results from the H&R Block Fafsa Experiment
Quick Exercise
- Click on the link to the article
- Assuming you set up the Zotero web connector, use the button in your brower to save the article to Zotero
- Open Zotero to check it’s there
- If that worked, you should see something like this
- Side note: If you notice on the left of my screen, I have a lot of folders and sub-folders set up
- If you use Zotero, I’d strongly encourage keep related articles in sub-folders, it makes it much easier to find them
- However, this is not a Zotero class, so now have an article in Zotero, it’s time to go back to Quarto
Citing Articles in Zotero Library in Quarto
- Not only is Zotero a nice way of storing articles, it’s real power is automatic citation creation
- If an article is in your Zotero main library (the default location) Quarto should automatically be able to find it
- To make the most of automatic citations, you need to be in the visual editor
- In the visual editor, if you simply type
@
you should see a search menu pop up through which you should be able to find any article you have saved- For example, type
@bet
and you should see the article we just saved - Click on it, and it will bring up another menu, which adds it to project
- By default this automatically
- Creates a file
references.bib
in your working directory - Adds a line to your header saying
bibliography: references.bib
- Add the article information to that
- Creates a file
- This is all behind the scenes, there are things you can and may want to change later, but leave it as default for now
- For example, type
Quick Excercise
- In your Quarto document, go to the visual editor
- Add a citation for the Bettinger article
- Render the document and see what you think
Using a Specific Citation Style
- Did it work?
- It should have added a citation, but, it’s not in APA format
- To change that, we simply need to tell Quarto what style we want our citations in
- If you’re using
format: typst
this is super easy- Simply add
bibliographystyle: apa
and it will handle it from there
- Simply add
- If you’re using any other format (
pdf
,docx
,html
, etc.) we have to give it a little more info- Zotero has an online repository of hundreds of citation styles
- For our purposes, we just want the standard APA 7th Edition
- Click on “American Psychological Association 7th edition”, download it, and save it in your class folder
- Now we have it downloaded to our class folder, we just need to add the option
csl: apa.cls
to our header
- If you’re using
Quick Excercise
Either
Using typst
1a. Add
bibliographystyle: apa
Or
Using any other format
1a. Make sure the apa.csl file you downloaded from Zotero’s collection is in your class folder 1b. Add
csl: apa.csl
to your headerThen
- Render your Quarto document
Better right? What’s the last thing its missing?
- By default, the citations come at the end of the document
- You can change this pretty easily, but it’s not improtant for now
- Normally, we want the bibliography to appear on a new page with a header “References”
- This is pretty easy to do
- You can add a pagebreak with
- Then simply add a level one header
# References
- You can add a pagebreak with
Quick Excercise
- Add the pagebreak and References header to get the bibliography on a new page
- As a final note, most of you won’t use Quarto all the time after this class, and that’s okay
- Zotero has similar integration with Word and Google Sheets too!
Customizing Format & Layout
- You might wondering, “how can I make the References header centered”
- This is beyond the scope of what we can cover today
- For your final project, we do not expect you to customize the formatting of the report
- That said, as I showed with examples at the start, you can make some really pretty documents using Quarto
- The Quarto guide website has bunch of pages on more advanced features for each format
- Under “Documents” select whatever format you’re wanting to use
- For example, typst has “Typst Basics” and “Custom Formats” pages to dive into
- As you’ll see in those documents, there are ways to use header options, preset templates, and custom-made templates to change how your document looks
- If you’re asking “Can I make it look like x?”, the answer is almost certainly “yes”, it’s just a question of how easy it is…
- Getting a document set up in APA paper format, pretty easy (there are a bunch of templates already written)
- Creating a customized colorful report for your department, probably a little harder, but do-able!
- Under “Documents” select whatever format you’re wanting to use
Wrap-Up
- We covered a lot today, but, hopefully it has shown you the potential power of Quarto
- We do not expect you to use all of the more advanced things we covered towards the end of the lesson in your final report, but at minimum
- It should be create a pdf document (either way is fine)
- Your report should have a title, author, and date all set using the header options
- You should have clear headings/sub-headings to structure your report
- All your tables should be actual tables using
kable()
(or similar tool, just not console-style printout) - All your tables and figures should be included where they are mentioned in the text
- Ideally, they’re numbered using Quarto’s automatic numbering
- Where appropriate you have citations
- Ideally, they’re created using Quarto’s citation system and Zotero
Appendix I: Sourcing scripts
- Somtimes, you can write all your code in the Quarto document
- For your final report, this will likely be fine
- However, sometimes it’s easier to write longer chunks of code in an R script and then simply run that R script to use the results
- This is called
source()
-ing a script
- This is called
- Remember, whenever Quarto renders a document, it starts from an empty environment
- So, if you want to read in, join, reshape, and then clean a bunch of data, you might end up with a lot of code at the start of your document
- A better option might be to write all that code in a separate .R file and then source it
- When you do this, any objects created in those scripts will then be in the environment and able to be added seamlessly to your Quarto document
- To demonstrate, let’s
source()
our R script from last week’s lesson- There are two things that need to be right for this to work
- The script has to be able to be run top to bottom with no errors
- Anything that creates an error needs to be
##
commented out- This might be something you need to do if you added notes or changed stuff that now doesn’t run
- You need to know what the file is called and where it is (i.e., the file path)
- The we simply say
source("<file path>.R")
- There are two things that need to be right for this to work
source("lesson-06-viz-ii.R")
Quick Exercise
- Add a line to your Quarto document that sources last weeks lesson
- Render the document and if you notice anything different (you shouldn’t)
- Now, with that script sourced, everything we made in the script is available in the Quarto document
- So, we can use objects we created in that script by simply using their name
- Remember right at the end of last week we saved our final plot to
patch
and I said why was a surprise, this is why! - We simply call
patch
and our fancy patchwork will print out beneath the chunk- Note, since we loaded the
patchwork
library in the script we sourced, we don’t need to load it again here
- Note, since we loaded the
patch
- This logic is really useful to keep long streams of code out of your Quarto document
- If you have 100s of lines of code cleaning a large dataset, it’s probably best to create
data_clean
indata-cleaning.R
and thensource("data-cleaning.R")
to be able to usedata_clean
Appendix I.V: Conditional Sourcing and Saving R Objects
- Ideally, if you have a lot of code that you don’t want to have in your .qmd document
- But, what if that code takes minutes (or hours, or days) to run?
- Then it will make it impossible to work with your .qmd document if it has to run every time
- In this case, you can use some skills we will actually learn later in the class to help
- At the end of the script you’re trying to
source()
you can save the output as .Rdata
- This is bascially saving the objects in your environment
save(list = c("my_object"),
file = "results.Rdata")
- Then, you can tell R to try and load that results file, but, if it’s not there, source the script
if(file.exists("results.Rdata")) {
load("results.Rdata")
else {
} source("a-really-long-script.R")
}
- I don’t expect this will make sense until after our programming lesson, but, I wanted it here for reference
Appendix II: Advanced Descriptive Statistics Tables with gtsummary
- Often, a simple
summarize()
andkable()
will produce the descriptive statistics output we want - However, the
gtsummary
package has atbl_summary()
which I really like to create some more complicated/customized table- In particular, I LOVE how it can handle continuous and categorical variables differently within the same table
- First, let’s install and load the
gtsummary
package
install.packages("gtsummary")
library(gtsummary)
- Let’s take a look a smaller descriptive statistics table looking at Math scores and region
- At it’s most simple, we just select the columns we want to summarize and pipe
|>
intotbl_summary()
- At it’s most simple, we just select the columns we want to summarize and pipe
|>
data select(x1txmtscor, x1region) |>
tbl_summary()
Characteristic | N = 16,4291 |
---|---|
x1txmtscor | 52 (46, 59) |
x1region | |
1 | 2,596 (16%) |
2 | 4,385 (27%) |
3 | 6,660 (41%) |
4 | 2,788 (17%) |
1 Median (Q1, Q3); n (%) |
- By default, you can see it provides median and interquartile range for continuous variables, and counts with percentages for categorical variables
- This alone would be hard to produce by hand
- Now, let’s be more specific about what kind of statistics we want to see
- This is my personal preference for a descriptive stats table
|>
data select(x1txmtscor, x1region) |>
tbl_summary(type = all_continuous() ~ "continuous2",
statistic = c(all_continuous() ~ c("{mean}",
"{sd}",
"{min} to {max}")))
Characteristic | N = 16,4291 |
---|---|
x1txmtscor | |
Mean | 52 |
SD | 10 |
Min to Max | 24 to 82 |
x1region | |
1 | 2,596 (16%) |
2 | 4,385 (27%) |
3 | 6,660 (41%) |
4 | 2,788 (17%) |
1 n (%) |
type = all_continuous() ~ "continuous2"
- This just means I want continuous variables to be given more than one line in the table
statistic = c(all_continuous() ~ c("{mean}",
"{sd}",
"{min} to {max}")
- This spells out exactly what statistics I want and how to lay them out
all_continuous() ~
means for continuous variables, do this- Similarly, if you want to change how categorical variables are described you would use
all_categorical() ~
- Similarly, if you want to change how categorical variables are described you would use
- You’ll then see than anything in
{}
is a statistic I want, and line breaks, words, and punctuation outside the{}
are included as typed{mean}
gives the mean and{sd}
gives the standard deviation{min} to {max}
gives the minimum value, the word “to”, then the maximum value
- This spells out exactly what statistics I want and how to lay them out
- Lastly, by default
gtsummary
produces pretty html based tables, these play nicely with some foramts, but not others - If you’re having trouble getting the table to appear in the format you’re rendering to,
gtsummary
has a collection of functions to convert the table to different formats- Here, we will use
as_kable()
to print the same table as above, but as a kable (just like we would make)- This will sacrifice some of the details and styling, but it’s a good option if you want need simplicity/compatibility, or, just want it match other
kable()
s you already made
- This will sacrifice some of the details and styling, but it’s a good option if you want need simplicity/compatibility, or, just want it match other
- Here, we will use
|>
data select(x1txmtscor, x1region) |>
tbl_summary(type = all_continuous() ~ "continuous2",
statistic = c(all_continuous() ~ c("{mean}",
"{sd}",
"{min} to {max}"))) |>
as_kable()
Characteristic | N = 16,429 |
---|---|
x1txmtscor | |
Mean | 52 |
SD | 10 |
Min to Max | 24 to 82 |
x1region | |
1 | 2,596 (16%) |
2 | 4,385 (27%) |
3 | 6,660 (41%) |
4 | 2,788 (17%) |
- This is very much beyond the expectation for summary tables in your final reports, but I wanted to show how you might want to think about descriptive tables for future publishable work
- There is so much more to
gtsummary
than this, we will see this package again in our Bringing It All Together (Feat. Basic Models) Lesson - If you’re interested in learning more check out
gtsummary
’s reference website
Appendix III: Automated Batch Reporting with Quarto
- This is well beyond the scope of the class, but, it shows the power of Quarto
- Imagine you work for a university system office and you need to write the same report with the same information but customized for each campus
- Or, imagine your working on a grant funded research project and each of your participant schools wants a customized report about their own outcomes
- Quarto has the power to produce this kind of document with a single source file, like I will show in class
Good news! By the time you have completed the quick exercises in this class, you should have some of this assignment already written
Question One (copy over from class)
a) Change the header title to something appropriate for your final report
b) Change the header author to your name
c) Change the header date so that it uses the current date whenever it is rendered
d) Add a level-one heading “Data”
e) Write a sentance or two about the data you’re using for you final project
f) Make the source of your data (e.g., IPEDS) italic
g) Set the output format to create a pdf (either pdf
or typst
)
Question Two (to be done after class)
a) Read in at least one data file you intend to use for your final project
b) Include a plot showing the distribution of your primary outcome/variable of interest
Bonus: Caption and number it using Quarto’s automatic system
c) Choose a categorical variable in your dataset, get the counts and percentages in each category (a Data Viz I throwback), and turn that into a nicely formatted kable()
Bonus: Caption and number it using Quarto’s automatic system
d) Find an article that you think might be relavent to your final project, save it using Zotero, and then cite it using Quarto’s Zotero integration
Bonus: Use APA 7 citation style and place the bibliography on a new page
e) Save the file as reproducible-report.qmd
Congratulations, you’ve officially started the Quarto document for your final project!
Submission
Once complete turn in the .qmd file (it must render/run) and .pdf output to Canvas by the due date (usually Tuesday 12:00pm following the lesson). Assignments will be graded before next lesson on Wednesday in line with the grading policy outlined in the syllabus.