These blog posts are mostly on topics such statistics, data analysis, computational modelling, R, Python, etc.

How to Make an R package

December 3, 2021

This post provides a guide to making R packages. The example R package that we make is small and simple, but one that has all, or almost all, of the main and general R package features. The guide is broken down into a series of steps beginning with creating a bare bones R package skeleton, and ending with pushing the completed package to GitHub. It covers how to add code and data to a package, how to write code tests, how to create documentation, including vignettes, how to make a pkgdown website, and other key features of R packages.

Reshaping data with pivot_longer and pivot_wider

July 25, 2021

In this post, we describe how to use dplyr’s pivot_longer and pivot_wider functions. These are used to reshape data frames from wide to long formats, and long to wide formats, respectively. We will discuss the basic versions of both functions, but then also discuss some of their more complex variants.

Read Multiple Files into a Single Data Frame

July 23, 2021

In R, usually data frames are created by reading data from a single file. Sometimes, however, we may wish to read multiple files into a single data frame. In this post, we will look at some of the main tidyverse ways in which we can read in files into a single data frame.

Statistical analysis of subgroups with nested data frames

July 17, 2021

Here, we will look at how to perform separate statistical analyses of each subgroup of a data set, the results of which can then be combined into a new data frame. For this, we will primarily use the nest, and related functions, in the tidyr package.

German Tank Problem: A Bayesian Analysis

July 16, 2021

Here, we present a Bayesian analysis of the German tank problem, which is the problem of estimating the size of set on the basis of observing the rank orders of a sample of elements.

A brief introduction to probability theory

July 13, 2021

In this post, we provide a brief introduction to the basics of probability theory, covering the fundamental definitions and axioms, the concepts of random variables, joint probability distributions, conditional probability distributions, statistical independence, Bayes theorem, and other concepts.

Parallel computing in R with the parallel package

July 7, 2021

This post describes how to do parallel processing in R using the parallel package. First, we will provide a brief general introduction to the parallel programming tools in R’s parallel package. Then, we will explore these tools by way of two applications: bootstrapping, and the parallel execution of Stan based Bayesian models.

Bootstrap confidence intervals

July 6, 2021

This post provides a brief introduction to bootstrapping and in particular, how it can be used to calculate confidence intervals. It focuses on the simplest and most familiar case of bootstrapping, which is a Monte Carlo based non-parametric method.

Iterations in R with lapply (and variants) and purrr

July 6, 2021

This post describes how to perform iterations using functionals in base and in the purrr package. The base R functionals particularly include lapply and its variants like sapply, vapply, and mapply. In the purrr package, the principal functional is map, of which there are numerious variants.

Superseding dplyr's suffixed variants

October 27, 2020

This post explains how we can use across, where(), rename_with(), etc., to perform actions that were previously accomplished with the _if, _at, _all variants of the dplyr verbs. In addition, it also briefly covers a related function, c_across, which is used for rowwise operations in data-frames that were previously very awkward to accomplish in other ways.

Permutations and combinations

January 30, 2019

Permutations and combinations frequently arise in probability calculations. Here, we provide a brief introduction to them and also show how to calculate them in R.

Web scraping Wikipedia data tables into R data frames

January 28, 2019

Wikipedia provides a lot of very useful tables of data. The data, however, are in the form of html tables, rather than some easy to import format like csv. To get this table into, for example, an R data frame requires some web scraping followed by data wrangling.

Logarithms

January 8, 2019

An introduction to logarithms, as well as exponentiation, which is the inverse of a logarithmic function, and their algebraic properties, and how to calculate them using R.