Summary and Schedule
Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with social sciences data in R.
This is an introduction to R designed for participants with no programming experience. These lessons can be taught in a day (~ 6 hours). They start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
Getting Started
Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
These lessons assume no prior knowledge of the skills or tools.
To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.
If you are teaching this lesson in a workshop, please see the Instructor noteshttps://datacarpentry.org/r-socialsci/instructor/instructor-notes.html).
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Before we Start |
How to find your way around RStudio? How to interact with R? How to manage your environment? How to install packages? |
Duration: 00h 40m | 2. Introduction to R |
What data types are available in R? What is an object? How can values be initially assigned to variables of different data types? What arithmetic and logical operators can be used? How can subsets be extracted from vectors? How does R treat missing values? How can we deal with missing values in R? |
Duration: 02h 00m | 3. Starting with Data |
What is a data.frame? How can I read a complete csv file into R? How can I get basic summary information about my dataset? How can I change the way R treats strings in my dataset? Why would I want strings to be treated differently? How are dates represented in R and how can I change the format? |
Duration: 03h 20m | 4. Data Wrangling with dplyr |
How can I select specific rows and/or columns from a dataframe? How can I combine multiple commands into a single command? How can I create new columns or remove existing columns from a dataframe? |
Duration: 04h 00m | 5. Data Wrangling with tidyr | How can I reformat a dataframe to meet my needs? |
Duration: 04h 40m | 6. Data Visualisation with ggplot2 |
What are the components of a ggplot? How do I create scatterplots, boxplots, and barplots? How can I change the aesthetics (ex. colour, transparency) of my plot? How can I create multiple plots at once? |
Duration: 06h 35m | 7. Getting started with R Markdown (Optional) |
What is R Markdown? How can I integrate my R code with text and plots? How can I convert .Rmd files to .html? |
Duration: 07h 20m | 8. Processing JSON data (Optional) |
What is JSON format? How can I convert JSON to an R dataframe? How can I convert an array of JSON record into a table? |
Duration: 08h 05m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Setup instructions
R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment (IDE) that makes using R much easier and more interactive. You need to install R before you install RStudio. Once installed, because RStudio is an IDE, RStudio will run R in the background. You do not need to run it separately.
After installing both programs, you will need to install the
tidyverse
package from within RStudio. The
tidyverse
package is a powerful collection
of data science tools within R see the tidyverse
website for more details. Follow the instructions below for your
operating system, and then follow the instructions to install
tidyverse
.
Windows
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check which version of R you are using, start RStudio and the
first thing that appears in the console indicates the version of R you
are running. Alternatively, you can type
sessionInfo()
, which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, you can update R using theinstallr
package, by running:
R
if( !("installr" %in% installed.packages()) ){install.packages("installr")}
installr::updateR(TRUE)
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Run the
.exe
file that was just downloaded. - Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Windows. Vista/7/8/10 (where x, y, and z represent version numbers).
- Double click the file to install it.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
macOS
If you already have R and RStudio installed
- Open RStudio, and click on “Help” > “Check for updates”. If a new version is available, quit RStudio, and download the latest version for RStudio.
- To check the version of R you are using, start RStudio and the first
thing that appears on the terminal indicates the version of R you are
running. Alternatively, you can type
sessionInfo()
, which will also display which version of R you are running. Go on the CRAN website and check whether a more recent version is available. If so, please download and install it. In any case, make sure you have at least R 3.2.
If you don’t have R and RStudio installed
- Download R from the CRAN website.
- Select the
.pkg
file for the latest R version. - Double click on the downloaded file to install R.
- It is also a good idea to install XQuartz (needed by some packages).
- Go to the RStudio download page.
- Under Installers select RStudio x.yy.zzz - Mac OS X 10.6+ (64-bit) (where x, y, and z represent version numbers).
- Double click the file to install RStudio.
- Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
Linux
- Follow the instructions for your distribution from CRAN, they provide
information to get the most recent version of R for common
distributions. For most distributions, you could use your package
manager (e.g., for Debian/Ubuntu run
sudo apt-get install r-base
, and for Fedorasudo yum install R
), but we don’t recommend this approach as the versions provided by this approach are usually out of date. In any case, make sure you have at least R 3.2. - Go to the RStudio download page.
- Under Installers select the version that matches your
distribution, and install it with your preferred method (e.g., with
Debian/Ubuntu
sudo dpkg -i rstudio-x.yy.zzz-amd64.deb
at the terminal). - Once it’s installed, open RStudio to make sure it works and you don’t get any error messages.
- Before installing the
tidyverse
package, Ubuntu (and related) users may need to install the following dependencies:libcurl4-openssl-dev libssl-dev libxml2-dev
(e.g.sudo apt install libcurl4-openssl-dev libssl-dev libxml2-dev
).
For everyone
After installing R and RStudio, you need to install the
tidyverse
and here
packages.
After starting RStudio, at the console type:
install.packages("tidyverse")
followed by the enter key. Once this has installed, typeinstall.packages("here")
followed by the enter key. Both packages should now be installed.For reference, the lesson uses
SAFI_clean.csv
. The direct download link for this file is: https://github.com/datacarpentry/r-socialsci/blob/main/episodes/data/SAFI_clean.csv. This data is a slightly cleaned up version of the SAFI Survey Results available on figshare. Instructions for downloading the data with R are provided in the Before we start episode.The json episode uses
SAFI.json
. The file is available on GitHub here.