3 min read

False Normality

A lot of statistical tests assume normal distribution of raw data or residuals from predicted model outcomes. One way to check for normality is visual, using histograms or Q-Q plots. Another way is statistical.

Of the statistical tests of normality, the Shapiro-Wilk test is the most powerful. See Razali and Wah for further discussion about power.

I’m more interested in the false-positive rate of the Shapiro-Wilk test than it’s power (the true-positive rate), as if the test falsely identifies data from a non-normal population as normal, it would undermine further analyses, and the conclusions of a study. On the other hand, if the test finds data from a normal population to be non-normal, an alternative analysis would be chosen that is more robust to non-normal distributions, which would probably not damage the study in any way if the test was incorrect.

But how often might data from a non-normal population be identified as normal? To provide a tentative answer to this question, I simulated continuous, random and univariate data of various sample sizes to assess the false normality rate of the Shapiro-Wilk test.

Using the code appended to this post, this plot was produced:

From around a sample size of 100, the Shapiro-Wilk test is pretty much 100% accurate, never identifying a normal distribution in completely random data across 10,000 iterations. When the sample size is around 50, there is a 25% probability of random data being identified as normal. The lower the sample size, the higher the false normality rate. (Look at the variable falseNormData defined in the code below to see the exact values.)

From this exploration I think it is safe to say that, with small sample sizes, visual methods may be more appropriate for assessing normality, and the population distribution is hard to estimate.

## Load required packages
library(tidyverse)
library(purrr)
library(ggplot2)

## Set Seed
set.seed(1)

## Making function to find "false normality rate" for one sample of uniform (random) data
falseNorm <- function(samp.size) {
  
n.tests <- 10000

tests <- vector(length = n.tests)

for (i in 1:n.tests) {
  u <- runif(samp.size, min = 0, max = 10)
  
  tests[i] <- shapiro.test(u)$p.value

}

falseNormality <- mean(tests>=0.05)

falseNormDataTemp <- tibble("false.normality" = falseNormality, "sample.size" = samp.size)

}


## Making vector to put into falseNormR() as "samp.size" argument
## MINUMUM SAMPLE SIZE IS 3
samp.sizes <- c(3, seq(10, 200, by = 10))

## Iterating falseNorm function and making dataframes
falseNormData <- purrr::map(samp.sizes, falseNorm) %>% bind_rows()

## Plotting
g <- ggplot(falseNormData, aes(x = sample.size, y = false.normality)) +
  geom_density(stat = "identity", alpha = 0.5, fill = "grey") +
  ylab("False Normality") +
  xlab("Sample Size")

## Uncomment next line to save the plot as an image
#ggsave("falsenormality.png", g)