# Doing Data Science in R: An Introduction for Social Scientists

This book was published by SAGE in March, 2021. It is available for purchase on Amazon and elsewhere. Below, you will find links to the RMarkdown source code of the book, copies of the data files uses in the book, and preprints of all the chapters.

*Doing Data Science in R* is about statistical data analysis
of real world data using modern tools. It is aimed at those who are
currently engaged in, or planning to be engaged in, analysis of
statistical data of the kind that might arise at or beyond PhD level
scientific research. It is ostensibly aimed at researchers in social
science fields, but in fact is equally applicable to many other
scientific fields, particularly the biological, medical, and life
sciences fields. The data in these types of scientific fields is
complex. There are many variables and complex relationships between
them. Analyzing this data almost always requires data wrangling,
exploration, and visualization. Above all, it involves statistical
modelling the data using flexible probabilistic models. These models are
then used to reason and make predictions about the scientific phenomenon
being studied. This book aims to address all of these topics.

# Source code

This book was written entirely in RMarkdown and compiled to pdf using GNU Makefiles that are run inside a Docker container. The RMarkdown source code, any requisite R or Stan scripts and data files, and the shell scripts and GNU Makefiles and Docker files for building the pdfs of the book are all available in this GitHub repository.

# Data files

The csv data files used in each chapter are available in this zipfile, which has subdirectories for each chapter, inside of which are its data files.

# Chapter preprints

There are 17 chapters in the book, listed below. For each one, you can download a preprint pdf. These are the pdfs of the versions of the chapters before the book manuscript was sent to the publisher. Therefore, there are differences between these and the chapters in the published book.

- Chapter 1: Data Analysis and Data Science
- Chapter 2: Introduction to R
- Chapter 3: Data Wrangling
- Chapter 4: Data Visualization
- Chapter 5: Exploratory Data Analysis
- Chapter 6: Programming in R
- Chapter 7: Reproducible Data Analysis
- Chapter 8: Statistical Models and Statistical Inference
- Chapter 9: Normal Linear Models
- Chapter 10: Logistic Regression
- Chapter 11: Generalized Linear Models for Count Data
- Chapter 12: Multilevel Models
- Chapter 13: Nonlinear Regression
- Chapter 14: Structural Equation Modelling
- Chapter 15: High-Performance Computing with R
- Chapter 16: Interactive Web Apps with Shiny
- Chapter 17: Probabilistic Modelling with Stan