index – Mark Andrews

This is a collection of notes on topics such as statistics, data analysis, data science, machine learning, deep learning, computational modelling, R, Python, Linux, etc.

Measuring Approximate Number Sense Acuity

Humans and many other animals have the ability to quickly estimate the number of items in a set. This ability is known as the approximate number system (ANS). The term ANS acuity refers to the precision or sensitivity of an individual’s ANS ability. Two widely used and related ways of defining and measuring ANS acuity are based on the concept of a mental number line. Here, we describe how ANS acuity is defined and measured in these two models.

18th April, 2025

Qualitative text analysis with local LLMs

A note on how to analyse text for themes and topics using local LLMs like Llama. This note is divided into three parts. I cover how to install Ollama and use it in R, provide a worked example of qualitative text analysis using Llama, and then discuss the computational power requirements for using local LLMs for real-world analyses.

28th January, 2025

Sending individualized Outlook emails using R

This note describes how to send emails through Outlook directly from R using the Microsoft365R package. In particular, it shows how you can easily send formatted personalized emails, including with attachments, using this package and a few other tools in R.

19th December, 2024

How to Make an R package

This post provides a guide to making R packages. The example R package that we make is small and simple, but one that has all, or almost all, of the main and general R package features. The guide is broken down into a series of steps beginning with creating a bare bones R package skeleton, and ending with pushing the completed package to GitHub. It covers how to add code and data to a package, how to write code tests, how to create documentation, including vignettes, how to make a pkgdown website, and other key features of R packages.

3rd December, 2021

Reshaping data with `pivot_longer` and `pivot_wider`

In this post, we describe how to use dplyr’s pivot_longer and pivot_wider functions. These are used to reshape data frames from wide to long formats, and long to wide formats, respectively. We will discuss the basic versions of both functions, but then also discuss some of their more complex variants.

25th July, 2021

Read Multiple Files into a Single Data Frame

In R, usually data frames are created by reading data from a single file. Sometimes, however, we may wish to read multiple files into a single data frame. In this post, we will look at some of the main tidyverse ways in which we can read in files into a single data frame.

23rd July, 2021

Statistical analysis of subgroups with nested data frames

Here, we will look at how to perform separate statistical analyses of each subgroup of a data set, the results of which can then be combined into a new data frame. For this, we will primarily use the nest, and related functions, in the tidyr package.

17th July, 2021

German Tank Problem: A Bayesian Analysis

The German tank problem is the name given to the problem of estimating the size of set of elements having observed the rank orders of a sample of elements from that set.…

16th July, 2021

Probability theory

A brief introduction to the basics of probability theory, covering the fundamental definitions and axioms, the concepts of random variables, joint probability distributions, conditional probability distributions, statistical independence, Bayes theorem, and other concepts.

13th July, 2021

Parallel computing in R with the parallel package

This post describes how to do parallel processing in R using the parallel package. First, we will provide a brief general introduction to the parallel programming tools in R’s parallel package. Then, we will explore these tools by way of two applications: bootstrapping, and the parallel execution of Stan based Bayesian models.

Bootstrap Confidence Intervals

This post provides a brief introduction to bootstrapping and in particular, how it can be used to calculate confidence intervals. It focuses on the simplest and most familiar case of bootstrapping, which is a Monte Carlo based non-parametric method.

Iterations in R with lapply (etc) and purrr functionals

This post describes how to perform iterations using functionals in base and in the purrr package. The base R functionals particularly include lapply and its variants like sapply, vapply, and mapply. In the purrr package, the principal functional is map, of which there are numerious variants.

Superseding dplyr’s suffixed variants

This post explains how we can use across, where(), rename_with(), etc., to perform actions that were previously accomplished with the _if, _at, _all variants of the dplyr verbs.

27th October, 2020

Permutations and combinations

Permutations and combinations frequently arise in probability calculations. Here, we provide a brief introduction and show how to calculate them in R.

30th January, 2019

Web scraping Wikipedia data tables into R data frames

Wikipedia provides a lot of very useful tables of data. The data, however, are in the form of html tables, rather than some easy to import format like csv. To get this table into, for example, an R data frame requires some web scraping followed by data wrangling.

28th January, 2019

Logarithms

An introduction to logarithms, as well as exponentiation, which is the inverse of a logarithmic function, and their algebraic properties, and how to calculate them using R.

8th January, 2019