A Brief Introduction to Statistical Data Analysis with Python
May 7, 2021
Online course
In this 2hr course, we will provide an introduction to data analysis
and statistics in Python. In particular, we will cover data processing
using pandas
, statistical analysis using
statsmodels
, and data visualization using
matplotlib
and seaborn
. In more detail, the
pandas
library provides means to represent and manipulate
data frames. We will introduce how to read data in Python using
pandas
, and perform some general data wrangling including
selecting rows and columns by name and other criteria, applying
functions to the selected data, aggregating the data, etc. We will also
look at general data visualization. The matplotlib
library
is a low level plotting library that allows for considerable control of
the plot, albeit at the price of a considerable amount of low level
code. Based on matplotlib
, and providing a much higher
level interface to the plot, is the seaborn
library. This
allows us to produce complex data visualizations with a minimal amount
of code. Finally, we will introduce how to to perform widely used
statistical analysis in Python. Here we will focus on
statsmodels
, which provides many of the mostly widely used
statistical methods.
Jupyter notebook
The Jupyter notebook for this course is here. Usually, Jupyter notebooks render as a nice webpage in GitHub. Sometimes, it does require a few reloads to get that. But of course, you can always download it and use it in your Jupyter or upload it and use it on Colab.
Use the notebook online with mybinder
The mybinder service allows you to share Jupyter notebooks so that others can use them (i.e. run them) online directly. Click the button below to access this repo in mybinder:
GitHub resources
Further resources for this training course can be found on Github at mark-andrews/intro2pystats.