• 欢迎访问交通人网站！
• 分享一款小游戏：信任的进化
•  发表于5年前 (2017-09-01) |   抢沙发  489
文章评分 1 次，平均分 5.0

During preparation for class I sometimes think up of animations that will explain the concept I am teaching. I sometimes share the resulting animations on social media via @rafalab.

John Storey recently asked if the source code is publicly available. Because I am not that organized, and these ideas come about during last minute preparations, the code was spread across several unrelated files. John’s request motivated me to include the code in one post.

All these gifs are paginated R plots. You will see in the code that I used different approaches to converting individual plots to animated gifs. The first (not recommended) was to save files then do a system call to the ImageMagick convert tool. Through a simplystats comment, from Yihui Xie, I learned about the saveGIF function from the animation package, which is what I now use when the plots are made in R base. When using ggplot I use David Robinson’s gganimate package. Finally, if I want to add special effects, like phasing, I use the online Animated GIF maker.

Below is the code for each of the gifs I have shared roughly ordered by popularity. Remember this code was written last minute so please don’t judge me. Actually, you can critique all you want, that’s how we learn.

This gif illustrates Simpson’s paradox. We see that $X$ and $Y$ have strong negative correlation. However, once we stratify by a confounder $Z$, encoded with color, the correlations flip to positive in each strata. The data is simulated, but we could see data like this if, for example, we looked at tutoring $X$ and 9th grade test score $Y$ data and then stratified students by their 8th grade test scores $Z$.

This gif is made up of just three plots. I saved them using RStudio’s Export tool then used Animated GIF maker to create the gif. Here is the code for the three plots:

## Loess

The first educational animation I shared explains how local regression (loess) works. Basically, for each predictor value, say $x_0$, assign positive weights to points close to that value, fit a line with weighted regression, keep the fitted value for $x_0$, move to the next point.

The data here comes from a microarray experiment. The figure shows an MA-plot (log ratio versus average of logs). I use the animation package to save the gif.

## Life Expectancy versus Fertility Rates

This gif is recreating an animation shown by Hans Rosling’s in his talk New Insights on Poverty. The point of the animation is to show the power of data visualization for combating misconceptions. In this particular instance Hans Rosling shows that the world was more dichotomous 40 years ago than it is today. Dividing the world into western rich countries with small families/long life spans and a developing world with large families/short life spans is no longer accurate.

The code for this plot is quite simple, thanks to the gganimate package.

## United Nations Voting Patterns

Here we used UN voting data provided by Erik Voeten and Anton Strezhnev to illustrate the concept of distance.

Below is the code. The wrangling code was provided by David Robinson. You will see that we smooth the distances across time to avoid having the points jump around too much.

## Random Forests

I used to find it hard to understand how Random Forests can produce smooth estimates given that they are based on trees. The gif helps illustrate how this can happen. I use 2008 presidential election data because I assume it is mostly driven by a smooth trend but with a couple of sharp edges that loess, for example, won’t catch. Note that, because we only have one predictor, the gif does not illustrate another important feature of Random Forests: how the random feature selection reduces correlation between trees.

In the code you will see that I am using the old, not recommended way, of saving files and using a system call to convert.

## Ecological Fallacy

After sharing the Simpson’s Paradox gif, a couple of people asked me if this was the same as the ecological fallacy. These two are different. The ecological fallacy is when we extrapolate high correlation seen for the average of strata to individuals. To illustrate this I used data from gapminder included in the dslabs package. It shows logistic transformed infant survival rates versus log daily income. I start by showing a very high correlation at the region level and a lower correlation at the individual country level. This is because there is country to country variability within region.

The gif is just three plots. I saved them using RStudio’s Export tool used Animated GIF maker to create the gif.

The first shows the averages, the second shows the individual values for Sub-Saharan Africa so you can see how one average breaks into more variable data, and the third shows all the individual data. I highlighted a few countries that show the variability. Note that I used a colorblind friendly palette. The code is a bit complex because I have to wrangle the Gapminder data.

## Bayes Rule

This simple animation shows, case by case, the results of applying a highly accurate diagnostic test to a population with low prevalence of disease. It helps illustrate how the posterior probability of having the disease given a positive test is lower than the accuracy of the test. You can use Bayes rule to determine the actual conditional probabilities. More details are here.

Because we are not plotting data but drawing a cartoon, the code is a bit complex and hard to read.

## Pacman

Finally, I made this plot to show the only instance in which pie charts are useful.

The code fits in a tweet.

打赏
微信
支付宝

微信 扫描二维码打赏

支付宝 扫描二维码打赏

本文来源于 Simply Statistics - A statistics blog by Rafa Irizarry, Roger Peng, and Jeff Leek，原文链接为 https://simplystatistics.org/2017/08/08/code-for-my-educational-gifs/

交通人博客是交通人工作室（JTR Studio）建立的交通人系列网站之一，是交通人工作室的主阵地，旨在整合和分享交通行业相关资讯，具体包括但不限于行业新闻、行业动态，以及行业相关规范、书籍、报告和软件等资源。

﻿
切换登录

扫一扫二维码分享