Exploring P-values with Simulations in R

The recent flare-up in discussions on p-values inspired me to conduct a brief simulation study.

In particularly, I wanted to illustrate just how p-values vary with different effect and sample sizes.
Here are the details of the simulation. I simulated $n$ draws of my independent variable $X$:

$X_n \sim N(100, 400)$

where

$n \in \{5,6,...,25\}$

For each $X_n$, I define a $Y_n$ as

$Y_n := 10+\beta X_n +\epsilon$

where

$\epsilon \sim N(0,1)$
$\beta \in \{.05, .06,..., .25 \}$

In other words, for each effect size, $\beta$, the simulation draws $X$ and $Y$ with some error $\epsilon$. The following regression model is estimated and the p-value of $\beta$ is observed.

$Y_n = \beta_0 + \beta X_n$

The drawing and the regression is done 1,000 times so that for each effect size – sample size combination, the simulation yields 1,000 p-values. The average of these 1,000 p-values for each effect size and sample size combination is plotted below.

Note, these results are for a fixed $var(\epsilon)=1$. Higher sampling error would typically shift these curves upward, meaning that for each effect size, the same sample would yield a lower signal.

There are many take-aways from this plot.

First, for a given sample size, larger effect sizes are “detected” more easily. By detected, I mean found to be statistically significant using the .05 threshold. It’s possible to detect larger effect sizes (e.g. .25) with relatively low sample sizes (in this case <10). By contrast, if the effect size is small (e.g. .05), then a larger sample is needed to detect the effect (>10).

Second, this figure illustrates an oft-heard warning about p-values: always interpret them within the context of sample size. Lack of statistical significance does not imply lack of an effect. An effect may exist, but the sample size may be insufficient to detect it (or the variability in the data set is too high). On the other hand, just because a p-value signals statistical significance does not mean that the effect is actually meaningful. Consider an effect size of .00000001 (effectively 0). According to the chart, even the p-value of this effect size tends to 0 as the sample size increases, eventually crossing the statistical significance threshold.

Code is available on GitHub.

Stop and Frisk: Spatial Analysis of Racial Differences

In my last post, I compiled and cleaned publicly available data on over 4.5 million stops over the past 11 years.

I also presented preliminary summary statistics showing that blacks had been consistently stopped 3-6 times more than whites over the last decade in NYC.

Since the last post, I managed to clean and reformat the coordinates marking the location of the stops. While I compiled data from 2003-2014, coordinates were available for year 2004 and years 2007-2014. All the code can be found in my GitHub repository.

My goals were to:

• See if blacks and whites were being stopped at the same locations
• Identify areas with especially high amounts of stops and see how these areas changed over time.

Killing two birds with one stone, I made density plots to identify areas with high and low stop densities. Snapshots were taken in 2 year intervals from 2007-2013. Stops of whites are indicated in red contour lines and stops of blacks are indicated in blue shades.

Stop and Frisk: Blacks stopped 3-6 times more than Whites over 10 years

The NYPD provides publicly available data on stop and frisks with data dictionaries, located here. The data, ranging from 2003 to 2014, contains information on over 4.5 million stops. Several variables such as the age, sex, and race of the person stopped are included.

I wrote some R code to clean and compile the data into a single .RData file. The code and clean data set are available in my Github repository. The purpose of this post is simply to make this clean, compiled dataset available to others to combine with their own datasets and reach interesting/meaningful conclusions.

Here are some preliminary (unadjusted) descriptive statistics: