-url origin git@github.com:<Username>/<Repo>.git git remote set
I: Git & GitHub
- This extra-credit lesson serves as a gentle introduction to
git
If you think of OneDrive, Google Drive, etc. as a nice automatic sedan, easy to drive, but you can’t control everything,
git
is like a stick shift sports car, it is way more capable, but requires some getting used to! - B.T. Skinner (paraphrased)
- Before we get started, a quick terminology definition, “git repo” or “repo” just means a folder that uses
git
- “Your repo”
==
“Your folder for the git project”
- “Your repo”
- There are so many things
git
can do beyond other cloud storage systems, but for this lesson we are going to focus on one of the most fundamental benefits, version control- Version control is one of
git
’s best features and helps avoid situations like…
- Version control is one of
- In short, instead of overwriting your file when you save like most cloud systems do,
git
actually just saves the changes from the last time you saved that file- Don’t worry about the technicalities now, just know that, not only is work saved with git and GitHub backed up, but if we want to go back in time to an old version, we can!
- The goal of this lesson is to have you set up your final project as a
git
project so you can keep track of your code changes as you go
Installing git
- The first thing you are going to need to do is install
git
From the git website http://git-scm.com/
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
- If you’re using a Mac, there’s a decent chance git is already installed, you may already have git on Windows if you’ve used something that installed it in the past
- To check if you have git copy this command into the terminal (note: not the console, the terminal which is next to console in RStudio)
- Once installed, you could keep using the terminal for git, but RStudio has a much more beginner friendly point-and-click system we will use instead
- To check if you have git copy this command into the terminal (note: not the console, the terminal which is next to console in RStudio)
which git
If it provides a file path to something called git, you have git, move on!
If you need to install git
- There’s a great resource “Happy git with R” by Jenny Bryan which cover a whole range of git topics beyond what we need today
- “Happy git with R”’s installation page has pretty clear instructions for installing git
- Note: You should always choose “Option 1” unless you
- Hopefully with those instructions you managed to get
git
installed if it wasn’t already, which is honestly the hardest part of this lesson!
Creating a GitHub Account
git
is a language which handles the version control, if you just wanted to use version control and store is all locally,git
is all you need- However, the real advantage of
git
is that you keep the version control both on your computer and in a “remote” repository (similar to OneDrive etc.) - There are plenty of git clients that offer this service, but by far the biggest is GitHub, which is what we will use
- So you need to create an account, which is just like signing up for any other online account
- Go to https://github.com
- Click “sign up” and follow the prompts on screen…
- Done!
FYI: If you plan to use GitHub regularly, students are eligible for free GitHub Pro, which I have, find our more here. This will allow you to create private GitHub repos, by default, all GitHub repos are public (which is how all open source stuff like R works)
Creating a New GitHub Repo
- There are two ways you can create a new GitHub repo
- Create a repo on your computer, then start tracking it with git, then link it to GitHub
- Create a repo on GitHub and
clone
it to your computer
- IMHO, this is far easier, so it’ what we will do!
Step One: Create Repo on GitHub
- When on https://github.com
- Navigate to “Your Repositories”
- Click the green “new” button
- Choose a name for the new repo
- Under “Add .gitignore” select the R template
- Everything else is optional, so don’t worry about it for now
- Note: If you signed up for GitHub pro, you can make the repo private
- If you do, make sure to add me @ttalVlatt so I can see it to give you credit
- You should be taken to your shiny new repo, yay!
Step Two: Setup GitHub SSH Access
- This is how your computer will have access to edit your repo
- It sounds scary, but luckily RStudio make it easy-peasy!
- In RStudio go to “Tools” and then “Global Options”
- Select “Git/SVN” from the left hand menu
- Under “SSH Key” select “Create SSH Key”
- Leave the optional pass phrase boxes blank and click “create”
- Close the pop-up box that appears
- Back on the “Git/SVN” page select “View Public Key”
- Copy that to your clipboard
- Go to GitHub.com
- Go to “Settings” on the menu under your profile icon
- Select “SSH and GPG Keys” from the left-hand menu
- Select the green “New SSH Key”
- Give the key a name
- If you’re using RStudio on your computer, this will be set for a while, so just call it “MacBook Pro” or something similar
- If you’re using RStudio Cloud, you need a new key for each project, so name it accordingly
- Paste the SSH Key you copied from RStudio into the “key” box
- Leave it set as “authentication key”
- Click “Add New SSH Key” and you’re done!
Cloning a Repo Down to Your Computer
- This step is a little different depending on if you’re using RStudio on your computer or the cloud, so I will outline each separately
RStudio on Your Computer
- Go to https://github.com/
- Go to “Your Repositories” and select the repo you just created
- Select the green “Code” button
- On there, under “Clone” select “SSH”
- Copy the address that should look like
[email protected]:<Username>/<Repo>.git
- Click on the blue cube in the top right (where we set up projects before)
- Click on “New Project” then “Version Control”
- Paste what you copied from GitHub as the URL
- Choose a file name and location that make sense (this is where the repo will be kept)
- Done!
posit.cloud
- Go to https://github.com/
- Go to “Your Repositories” and select the repo you just created
- Select the green “Code” button
- On there, under “Clone” and keep it on “HTTPS”
- Copy the address that should look like
https://github.com/<Username>/<Repo>.git
- On posit.cloud select “New Project” and then “From GitHub Repository”
- Paste the URL you copied in the URL box and select a name for the project
- Go back to your repo on GitHub and reselect the green “Code” button
- This time select “SSH” and copy the address that should look like
[email protected]:<Username>/<Repo>.git
- Once the project is opened, go to the terminal (next to the console)
- Type
git remote set-url origin [email protected]:<Username>/<Repo>.git
replacing<Username>
and<Repo>
with the correct names (you can copy from the block below) - Done!
- Okay, with git and GitHub set up, the hard part is over! Now we will just go over how to use what we set up
- Keep in mind, we are just going to cover one purpose git, this is just the beginning
Using git
for Version Control and Backup
- Getting a change from your computer to GitHub has three steps
- “Stage” the change, which tells
git
to pay attention to the change - “Commit” the change, which saves it to your local (on computer) version of git
- “Push” the change, which save it to your remote (GitHub) version of git
- “Stage” the change, which tells
Let’s see what that looks like in RStudio
- In the top right corner panel of RStudio (same area as the “Environment”) there’s a “Git” tab, select it
- You’ll see a few things here
- Along the top are some buttons for the core git commands of “Commit”, “Pull”, and “Push”
- Right now, there is probably nothing in the main area of the panel
- Go ahead and make a new
.R
script (doesn’t need to be anything in it) and save it in the project folder
- Go ahead and make a new
- Now you’ll see it in the main area of the “Git” panel
- Any changes you make to the repo will appear here, new files, changed files, deleted files, etc.
To backup these changes to GitHub, follow these steps 1. Click the white square box left of the file in the “Git” main panel - This “stages” the change, i.e., tells git
to pay attention to it 2. Click “Commit” - This will open a new box/window - In the top right hand box you can (and should) add an informative message about the change you made - E.g. “Created a test script” - Then hit the “Commit” button right underneath that 3. Finally, hit “Push” - You can do this in the same window, or at the top of the Git panel, it doesn’t matter - This “pushes” the changes you just “committed” up to GitHub - The very first time you do this, you may get a warning that the key isn’t know - Type “yes” as your response, you won’t see this again unless you make a new key
- You can
stage
,commit
, andpush
lots of change at once, or one by one- The big difference is that the less each individual commit and push does, the less you have to reverse
- For that reason, always push up things you’re sure about first, then things your not, in separate commits
- The big difference is that the less each individual commit and push does, the less you have to reverse
- This process may seem like a lot, but, it will become second nature once you start using it
- The ability to version control your code and easily track back to specific points is alone more than worth it
- That’s not to mention this is only an intro,
git
can do so much more as you get familiar with it
- That’s not to mention this is only an intro,
- Plus, if you can use
git
you will stand out from the crowd in serious data management jobs
- The ability to version control your code and easily track back to specific points is alone more than worth it
The Need to .gitignore
- When something appears in the RStudio “Git” panel that you don’t want to
push
you can right-click and hit the “ignore” option- This will add that file to a
.gitignore
file in your repo, and meansgit
will never try and track that file again - You can also add file names and/or patterns directly to the
.gitignore
file
- This will add that file to a
- This is useful for anything you don’t want sharing (as GitHub repos are public by default), or anything too large for
git
(big data sets etc.)
Sidenote: The Need to pull
- Another
git
command that is super common ispull
, this will just check for any changes in the GitHub copy of your repo and pull them down- If you keep things simple and only push changes to this project from one computer, there should never be anything to
pull
down- You can always hit the button if you’re curious, it will just say “already up to date”
- If you set this up on more than one computer, or start collaborating with someone else, you’ll need to
commit
andpush
when you’re finished working thenpull
from before you start work
- If you keep things simple and only push changes to this project from one computer, there should never be anything to
- Now we’ve covered some of the basics, I just to suggest a few rules you stick by with
git
git
Ground Rules
- Generally, git is best suited for plain text based files, like
.R
scripts and.qmd
files
git
can and will track other files, but it’s primarily meant for code, that is where version control is most powerful- Particularly is a non-code file is large, it is best to ignore it with
.gitignore
which we will talk about below
push
regularly and often
- Whenever you finish something, it’s generally a good idea to
push
those change up to GitHub - This makes each version
git
stores more granular, so you can undo one thing without undoing a bunch of things. That will make more sense over time, but for, justpush
pull
at the start of each work session
- This isn’t important if you’re using
git
in the simple way we are, but the second you start collaberating withgit
or even using on multiple computers, alwayspull
first - This will add any changes that have been
push
-ed to your files before you edit them, avoiding conflicts and making everyone’s lives easier
- Write useful
commit
messages
- Everytime you
commit
thenpush
you have to write a message, if we have to go back in time, this is how you will find the point to go back to, so don’t say make them descriptive
- Don’t panic
- Sometimes,
git
can get messed up, particularly when collaborating with others - The beuaty is that with version control, we can always go back and fix things
- If you run into
git
issues, I am happy to help, and if I can’t I know plenty of people who can!
- Never, ever, ever, put private or restricted information in
git
or GitHub
- By default GitHub repos are public, and even if they’re private, they are not approved places for private or restricted data
- Even if you’re just backing up code that uses restricted data, you should check-in with your data security/IT team to make sure you’re following institutional rules
Summary
- If you’ve made it to here, congratulations, you’re now officially a
git
user- I encourage you to keep at it, the more you use it, the easier it becomes
- Using
git
as a version control and backup for a single computer is the simplest way to usegit
and more than enough for a lot of people- If you get comfortable with
git
, it can do so much more- Working across multiple computers
- Collaborating with other researchers
- Creating
branches
of work to try out new approaches fork
-ing existing repos to make a new version of something someone else didpull request
-ing something youfork
-ed and improved/fixed to get your change added to the main project- Hosting websites with
gh-pages
(like this one!) - A whole lot more!
- If you get comfortable with
To earn these extra credit points, ultimately, you need to show you have used GitHub as a version control for your final project code.
To do this, there are four main steps, all of which are covered in the lesson section.
- Set up a GitHub account
- Set up the SHH authorization so you can push/pull GitHub from RStudio
- Set up a repo on GitHub and clone it down to your computer
- Utilize the git functionality to version control that R project
- To use git effectively, you should be pushing up every time you complete a significant chunk of work
Your submission should be a URL link on Canvas to a GitHub repo containing your report code
- If you made the repo private with GitHub Pro, you will need to share it with me
Once complete, turn in the url link of your GitHub repo to Canvas by the due date (Sunday 11:59pm following the final lesson). Good faith efforts (as determined by the instructor) at extra credit assignments will earn full credit if submitted on time.