What is statistical power?
Sometimes called the sensitivity of a hypothesis test, statistical power describes the ability for a statistical test to identify whether the effect it is trying to find or measure exists or not. If you’re wondering how a test designed to measure a statistical effect might be unable to measure that very effect, consider the following analogy.
I don’t know if the speaker on my phone is working properly. It may be playing music at too low a volume (partially broken), or not at all (completely broken), or it may be working normally. So I play a random .mp3 file I have saved on my phone, and when I do this, I can’t hear anything. Can I conclude that my speaker is broken? I might be able to, but this conclusion is certainly open to questioning. Maybe the file I tried to play was corrupt and had no sound, or maybe the volume on my phone was just turned off. Or maybe I tried the experiment in a loud room at a party. In the last case, it would be especially hard to distinguish between a partially broken and completely broken speaker.
This last example may seem far-fetched – why would I try to test my speaker in a loud room at a party? But this is exactly the point: the conditions of the test have an effect on whether, and how easily, the test is able to determine the thing it’s trying to test for. In a statistical context, this is more than a question about what room I’m in when I press play on my phone; it often becomes a question about how much data to collect. How much data is enough data to conclude that the effect I’m measuring exists, or doesn’t exist?
There are lots of great examples of this which are easy to find all over the internet. I think the Wikipedia article on Statistical Power  gives a pretty good treatment of the topic, and I particularly like entry from statisticsdonewrong.com . One thing I haven’t seen an explanation of, however, is how statistical power might factor into a company trying to optimize their collections techniques.
Say you run a collections centre, and you’ve spent the last few years collecting data on your process. You have a number of different channels available to contact your debtors, and a number of different strategies you can apply to try to effectively collect on those debts. However, most of the strategies you use are based on simple business rules written long ago, and you would like to try to improve your collected dollars by leveraging the data you have to make more informed decisions about how to contact your debtors. This sounds like a data science problem, so you’re all set to have your data scientists get to work.
However, there is a problem hiding behind the scenes. The data scientists could build models that use the existing data, which should be a record of which techniques did and didn’t work in different scenarios, to predict which technique will be most likely to work in each scenario going forward. Except that the data aren’t a record of which techniques worked in different scenarios. The data are a record of which techniques worked in the same scenario over and over. Your collections center (probably) hasn’t been randomly assigning treatments to understand which ones work and when (though if it has, shoot me an email, I’m eager to chat and perhaps work with you!). Instead, your collections center has (quite reasonably) been applying techniques that are known to work well enough for the time being.
If you want to build proper models and a good optimization scheme to really squeeze maximum value out of your collections center, you’re going to need to get more data on which treatments really work in which contexts. This means assigning random treatments to different test groups of debtors to see how they will respond. The problem is that in the short term, this will cost you money. You’re going to be applying a number of treatments that you believe are sub-optimal in order to find out if you’re right. This cost isn’t for nothing, you’re paying for information that will help you make better decisions in the long run, but that information isn’t free.
If you’re data-focused, you probably believe this short-term cost is worth the long-term gain, but that doesn’t mean you want to over-spend on it. So, you arrive at a question: how big do you need to make your test group(s) in order to really find out which collections strategies are effective in which situations? If you make the groups too small, you might not be able to determine that a certain strategy is effective enough to warrant using it. This could cost you money, because you might be missing out on good opportunities, or you might end up needing to re-run the tests if they are inconclusive. On the other hand, if you make your test groups larger than necessary, you’ll waste money by applying sub-optimal treatments unnecessarily, and collecting fewer dollars.
So, how do you determine the necessary size of your test groups? Luckily, science has your back. Data scientists have thought about this problem, and have determined the precise tool to use to make this determination. Statistical Power.