Introduction to statistics using R and RStudio

Mark Andrews

Date: 17-18 March, 2021

Location: Online course

Introduction to statistics using R and Rstudio

In this two day course, we provide a comprehensive introduction to R and how it can be used for data science and statistics. We begin by providing a thorough introduction to RStudio, which is the most popular and powerful interfaces for using R. We then introduce all the fundamentals of the R language and R environment: variables and assignment, data structures, operators, functions, scripts, packages, projects, etc. We then provide an introduction to data processing and formatting (aka, data wrangling), an introduction to data visualization, an introduction to RMarkdown, and introduce how to some of the most widely used statistical methods such as linear regression, Anovas, etc. From this course, you will gain a comprehensive introduction to R, which will serve as foundation for progressing further with R to any kind of data analysis, data science, or statistics.

Intended Audience

This course is aimed at anyone who is interested in using R for data science or statistics. R is widely used in all areas of academic scientific research, and also widely throughout the public, and private sector.

Teaching Format

This course will be hands-on and workshop based. Throughout each day, there will be a minimal amount of lecture style presentation, i.e., using slides, introducing and explaining key concepts. However, even in these cases, the topics being covered will include practical worked examples that will work through together.

Teaching will be done online via video link using Zoom. Although not strictly required, using a large monitor or preferably even a second monitor will make the learning experience better, as you will be able to see my RStudio and your own RStudio simultaneously. All the sessions will be recorded, and made available immediately on a private video hosting website. All materials will be shared via Git, which will allow for instantaneous sharing of code etc.

The course will take place online using Zoom. On each day, the live video broadcasts will occur between (UK local time; Greenwich Mean Time ; UTC+0) at:

12pm-2pm
3pm-5pm
6pm-8pm

Assumed quantitative knowledge

We will assume only a minimal amount of familiarity with some general statistical and mathematical concepts. These concepts will arise when we discuss statistics and data analysis. Anyone who has taken any undergraduate (Bachelor’s) level course on (applied) statistics can be assumed to have sufficient familiarity with these concepts.

Assumed computer background

No prior experience with R or any other programming language is required. Of course, any familiarity with any other programming will be helpful, but is not required.

Equipment and software requirements

Attendees of the course will need to use a computer on which RStudio can be installed. This includes Mac, Windows, and Linux, but not tablets or other mobile devices. An alternative to using a local installation of RStudio is to use RStudio cloud (https://rstudio.cloud/). This is a free to use and full featured web based RStudio. It is not suitable for computationally intensive work, and only a limited number of hours (15hrs) per month are available on the free plan.

Course programme

Day 1

Topic 1: The What and Why of R. We’ll start by briefly explaining what R is, what is used for, and why is has become so popular.
Topic 2: Guided tour of RStudio. RStudio is the most widely used interface to R. We will provide a tour of all its parts and features and how to use it effectively.
Topic 3: First steps in R. Now, we cover all the fundamentals of R and the R environment. These include variables and assignment, data structures such as vectors, data frames, lists, etc, operations on data structures, functions, scripts, installing and loading packages, using RStudio projects, reading in data, etc. This topic will be detailed so that everyone obtains a solid grasp on these fundamentals, which makes all subsequent learning much easier.

Day 2

Topic 4: Introducing wrangling. Data wrangling, which is the art of cleaning and restructuring data is a big topic. Here, we just provide an introduction (subsequent courses in this series will cover wrangling in depth). Here, we will primarily focus on filtering, slicing, selecting, renaming, and mutating data frames.
Topic 5: Data visualization. Data visualization is another big and important topics. Here, we just provide an introduction, specifically an introduction to ggplot (subsequent courses in this serious will cover visualization in depth). We’ll cover scatterplots, boxplots, histograms, and their variants.
Topic 6: RMarkdown. RMarkdown is a powerful tool for creating reproducible research reports, as well as slides, scientific website, posters, etc. In an RMarkdown document, we mix R code and the narrative text of the report, and the outputs of the R code, including figures, are included in the final document.
Topic 7: Introduction to Statistics using R. There are many thousands of statistical methods built into R. Here, we will simply provide an introduction to some of the most widely used methods. In particular, we will cover linear regression, Anova, and some other simple test. The aim of this section is to get a sense of how statistical analysis is done in a R, and how to perform some of the most widely used methods.

GitHub resources

Further resources for this training course can be found on Github at mark-andrews/irrs03.