Read more about what statistical power is and how to conduct a power analysis.

What is statistical power? How is it calculated?

In short, high statistical power means that you are likely to find an effect that is actually there. In more formal terms, power is the probability that you correctly reject a false null hypothesis when a specific alternative hypothesis is true. Consequently, an experiment with more power has a better chance of finding a true effect.

Before we get to the question how power is calculated, it is important to distinguish between two types of errors you can make when accepting or rejecting a null hypothesis. The first type of error (Type I or alpha, or false positive, typically 1% or 5%) means that you reject the null hypothesis although it is actually true — you find an effect that is not actually there. The second type of error (Type II or beta, or false negative, typically around 20%) means that you accept the null hypothesis although it is false — you fail to find an effect that is actually there.

So how is statistical power calculated then? It is simply 1 – beta.

What affects the power of a test? When looking at how statistical power is calculated, you will see that it is a function of several factors, namely (1) alpha, that is the probability of a Type I error, (2) the true alternative hypothesis, (3) the sample size, and (4) the particular test that you apply. We will disregard the fourth aspect here, as it would go beyond the scope of this guide.

However, please do check out the references provided at the end of this section for more information on statistical concepts and methods.

By increasing the probability of falsely accepting the alternative hypothesis (increasing alpha /Type-I error), you automatically decrease the probability of falsely rejecting the alternative hypothesis (beta/Type-II error). Simply put, increasing alpha means decreasing beta, which means more statistical power (1– beta).
Power also depends on the true alternative hypothesis and the effect size. The greater the true difference between the two values you want to compare is, the more power your statistical test will have to detect the effect. This simply means that the chances of finding a difference is greater when the difference itself is larger. The way in which the difference between the values is quantified is called the effect size. There are different types of effect sizes (standardized, unstandardized) and they depend on the type of data you have as well as on the statistical test you use.
An increase in sample size also increases the power of a statistical test. This increases your chances of finding true effects and reduces the chances of making a Type-I error. The reason for this is that, with a larger sample size, each distribution (the one under the alternative hypothesis as well as the distribution under the null hypothesis) gets narrower around the respective sample mean. Thus, any difference between the two distributions, however small, is more likely to become apparent. The sample size is usually the parameter that you can influence most easily in order to get an appropriate level of power. It is useful to note that studies with a very large sample size mean that even the smallest effect will be significant. When encountering studies like these, look at the effect sizes rather than levels of statistical significance to get a good idea of the practical relevance of the results.

The Technical Side: How to conduct a Power Analysis

You can conduct a power analysis either in advance of your data collection to determine the appropriate sample size for your test, or in retrospect in order to check the power of the test you have applied, given the sample and effect sizes you found. To conduct a power analysis, you can use, for example, G*Power, which can be downloaded for free here.

Note that you will have to enter an effect size in order to get a sample size given the power you chose. There are many ways to find this effect size. You could, for example, use the effect sizes found in meta-analyses. However, it is important to consider that these effect sizes are often overestimated because of the publication bias (i.e., the tendency of journals to publish mainly significant findings).

If your study is of exploratory nature, you should set the effect size that you aim to find. Keep in mind that it should be the least practically relevant effect size, that is, the effect size that finds the best balance between sample size and effect size, while still being statistically significant.

Lastly, it is important to note that studies can be underpowered. Studies with a power level that is too low usually have a small sample size, a small effect size and a small alpha value. By conducting an underpowered study, you run the risk of not finding a difference that is actually there (high beta-error probability).

So what is the optimal level of power then?

Ideally, you should aim for a power between .80 and .95. Keep in mind that the power, just as the effect size, is just an approximation and might change over studies, and that you need to find a balance between your resources (money and time to spend on number of participants) and the statistical power you are trying to achieve.

Sources:

Howell, D. C.(2012).Statistical methods for psychology. Cengage Learning.

Aron, A., Aron, E. N., Coups, E. J.(2013).Statistics for Psychology. 6th Edition.

http://daniellakens.blogspot.se/search?q=effect+size

https://academic.oup.com/jpepsy/article/34/9/917/939415

Am I doing exploratory or confirmatory research? Why does it matter?

How large should my sample be?

Two common statistical frameworks: Null Hypothesis Significance Testing and Bayesian Statistics

Pro-Tip: Point and Range Estimations – or: How to make your theory more interesting and useful

A manifesto for reproducible science (by Munafò et al., 2017)

Statistical Power

Read more about what statistical power is and how to conduct a power analysis.

What is statistical power? How is it calculated?

The Technical Side: How to conduct a Power Analysis

Sources: