Examining Power Calculations
Introduction & Problem Statement
Calculating sample sizes while accounting for power is relatively straightforward if you have all the required inputs. Many universities have online calculators that help you determine the appropriate sample size. The variables needed to calculate the sample size while accounting for power are listed below. This article will utilize Cohen’s d calculations in the examples presented. If you are unfamiliar with any of the terms below, please click on any symbols for more information.
In this type of problem, the significance level and power variables should be pre-determined, and two common values are traditionally set at 0.05 and 0.80, respectively.
The effect size is calculated by estimating or assuming the two groups’ mean difference and pooled standard deviation. Estimating the effect size is often the most challenging part of the analysis because the data and, thus, the distribution may not exist.
Roger D. Peng and Hilary Parker talk about this problem at great length in their book called Conversations on Data Science.
In the book, they agree that sample sizes are often pre-determined based on the budget in biostatistics. They also point to the fact that much time is typically spent estimating an appropriate effect size. Therefore, in practice, most biostatisticians actually back into the power calculation once they have determined the right effect size and know the sample size.
Solution Statement
This article does not go into great detail about dealing with the challenges of estimating effect sizes. However, it offers three data sets and shows how various effect sizes can dramatically influence the sample sizes required to uphold the significance and the power levels.
The article will also walk you through how to calculate the appropriate sample size step-by-step in R.
Lastly, this article lists excellent references that detail the math and some of the pitfalls biostatisticians run into when calculating power, effect size, and sample size.
Load Libraries
To get started, let’s load the libraries listed below.
library(tidyverse) library(tidyquant) library(grid)
Sample Size Calculation Components
Next, we need to assume or calculate the following metrics on each of our two groups (Group A and Group B. The items and the definitions are listed below.
- α = significance level (traditionally set at 0.05)
- (1 - β) = power (traditionally set at = 0.80)
- μa = population mean Group A
- μb = population mean Group B
- μd = population mean difference μa - μb
- σa = standard deviation Group A
- σb = standard deviation Group B
- σd = population standard deviation of the paired difference
- μd / σd = effect size
Significance Level and Power Inputs
These inputs can be determined before the study, and the values listed below are traditionally used in practice. However, please note any one of these values can also be calculated if all other items are known. We will assume a significance level of 0.05 and a power of 0.80 for all three dummy data sets for this article.
sign_level <- 0.05 pwr <- 0.80
Data Distributions of the Two Groups
The other item that affects each group’s appropriate sample sizes is the distribution of the data. In other words, we need to know the means and the standard deviations of the two groups before determining the appropriate sample size.
As noted before, this is typically the most challenging part of the calculation, because often the distributions of the two sets are unknown. However, there are ways to estimate the distributions of the two groups. Some standard methods include:
- Use estimates from previous studies or research previously. This may not be available.
- Run a pilot study. This may be expensive.
- Ask subject matter experts to estimate the distribution based on best guess. This will have the most uncertainty.
Create Multiple Sample Data Sets
We do not have access to actual data in this article, so we will use dummy data instead. We can create three (3) sample data sets that are normally distributed that assume the same significance level, power, and standard deviations. However, the data sets will include various mean differences.
set.seed(1234) data_set_01 <- tibble( group_A = rnorm(100, mean = 1.00, sd = 1.20), group_B = rnorm(100, mean = 1.10, sd = 1.20)) data_set_02 <- tibble( group_A = rnorm(100, mean = 1.00, sd = 1.20), group_B = rnorm(100, mean = 2.50, sd = 1.20)) data_set_03 <- tibble( group_A = rnorm(100, mean = 1.00, sd = 1.20), group_B = rnorm(100, mean = 4.50, sd = 1.20)) data_set_01 <- data_set_01 %>% mutate(data_group = "data_set_01") data_set_02 <- data_set_02 %>% mutate(data_group = "data_set_02") data_set_03 <- data_set_03 %>% mutate(data_group = "data_set_03")
Union Data Sets
Let’s stack the data sets on top of one another to appropriately add more calculated fields to the data set.
mult_data_set <- rbind(data_set_01, data_set_02, data_set_03) %>% select(data_group, everything())
Let’s take a look at the sample data set.
Additional Calculations in R
Before calculating the sample size for each group, we need to calculate the mean difference and the pooled standard deviation. We will also calculate the effect size.
mult_data_set_calcs <- mult_data_set %>% group_by(data_group) %>% mutate(mean_A = mean(group_A), mean_B = mean(group_B), mean_diff = abs(mean_A - mean_B), sd_A = sd(group_A), sd_B = sd(group_B), pooled_sd = sqrt((sd_A^2 + sd_B^2)/2), eff_size = mean_diff / pooled_sd ) %>% add_tally() %>% ungroup() %>% select(data_group, everything())
Let’s take a look at what the data set looks like now.
Mean Difference & Pooled Standard Deviation
The mean difference and the pooled standard deviation values are extracted from the tibbles. These individual values will be plugged into the power.t.test() function in the next step. We will also extract the effect size so that it can be included in the graphs later on.
data_set_01 <- mult_data_set_calcs %>% filter(data_group == "data_set_01") data_set_02 <- mult_data_set_calcs %>% filter(data_group == "data_set_02") data_set_03 <- mult_data_set_calcs %>% filter(data_group == "data_set_03") mean_diff_01 <- data_set_01$mean_diff[[1]] pooled_sd_01 <- data_set_01$pooled_sd[[1]] eff_size_01 <- data_set_01$eff_size[[1]] mean_diff_02 <- data_set_02$mean_diff[[1]] pooled_sd_02 <- data_set_02$pooled_sd[[1]] eff_size_02 <- data_set_02$eff_size[[1]] mean_diff_03 <- data_set_03$mean_diff[[1]] pooled_sd_03 <- data_set_03$pooled_sd[[1]] eff_size_03 <- data_set_03$eff_size[[1]]
Calculate the Appropriate Sample Size
Now that we have all the individual pieces for each data set, we can estimate the sample size for the 3 various distributions using the power.t.test() function.
power_test_01 <- power.t.test(sig.level = sign_level, sd = pooled_sd_01, delta = mean_diff_01, power = pwr) power_test_02 <- power.t.test(sig.level = sign_level, sd = pooled_sd_02, delta = mean_diff_02, power = pwr) power_test_03 <- power.t.test(sig.level = sign_level, sd = pooled_sd_03, delta = mean_diff_03, power = pwr)
To properly preserve the power and significance levels, we round the sample size up.
number_each_group_01 <- ceiling(power_test_01$n) number_each_group_02 <- ceiling(power_test_02$n) number_each_group_03 <- ceiling(power_test_03$n)
Additionally, we need to convert those values into a tibble format to be labeled in the distribution charts later on.
n_01 <- tibble(data_group = c("data_set_01"), n_per_group = number_each_group_01) n_02 <- tibble(data_group = c("data_set_02"), n_per_group = number_each_group_02) n_03 <- tibble(data_group = c("data_set_03"), n_per_group = number_each_group_03) n_all_sets <- rbind(n_01, n_02, n_03)
Tidy Up the Data
Before we graph the distributions, let’s make the data tidy.
metrics_by_data_set <- mult_data_set_calcs %>% select(data_group, mean_diff, pooled_sd, eff_size, n) %>% distinct() mult_data_set_tidy_tbl <- mult_data_set %>% pivot_longer( cols = -data_group, names_to = "group", values_to = "value" ) %>% left_join(metrics_by_data_set) %>% left_join(n_all_sets) ## Joining, by = "data_group" ## Joining, by = "data_group"
A sample of the tidy tibble is listed below.
Graphically Compare the Inputs and Results
Now that the data is tidy, we can graphically compare and contrast the results of three different data sets.
ggplot(mult_data_set_tidy_tbl, aes(x = value, fill = group)) + geom_density(alpha = .95) + scale_color_grey() + scale_fill_grey(start = 0.5, end = 0.80) + theme_classic() + theme(legend.position="bottom") + guides(fill=guide_legend("")) + labs(title = "Distribution Comparison", subtitle = "p-value = 0.05, Power = 0.80") + facet_wrap(~ data_group) + geom_text( data = mult_data_set_tidy_tbl, mapping = aes(x = -2.00, y = 0.07, hjust = 0, label = str_c("Mean Diff = ", format(mean_diff, digits = 2), "\n", "Pooled StDev = ", format(pooled_sd, digits = 2), "\n", "Effect Size = ", format(eff_size, digits = 2), "\n", "N Size = ", format(n_per_group, digits = 0))))
Final Conclusions, Analysis, and Thoughts
As witnessed in the graph above, the effect size can dramatically impact the sample size required to uphold the power and significance levels. Moving from left to right in the charts above, one can see that as there becomes less and less overlap in the two groups, the n size starts to decrease. In other words, smaller effect sizes require a more significant number of observations. Larger sample sizes are pretty cheap in tech companies testing website traffic patterns, so it may not be an issue. However, requiring a larger sample size in a field like biostatistics can be a problem since experimental data is often very costly.
As mentioned earlier, calculating the sample size can be a much more in-depth analysis if some of the data is unknown. Estimating the mean differences and standard deviations of the two groups is often a collective effort that involves statisticians and subject matter experts.
If you are interested in learning more about power calculations and their parts, please check out the fantastic references below.
References
Title: Power Analysis, Clearly Explained
Author: Josh Starmer
Link: https://www.youtube.com/watch?v=VX_M3tIyiYk&feature=youtu.be
Title: Statistical Power, Clearly Explained
Author: Josh Starmer
Link: https://www.youtube.com/watch?v=Rsc5znwR5FA&feature=youtu.be
Title: BMI 541/699 Lecture 13
Link: https://www.biostat.wisc.edu/~lindstro/13.sample.size.10.20.pdf
Title: Conversations on Data Science
Author: Roger D. Peng & Hilary Parker
Link: https://leanpub.com/conversationsondatascience



