When my Ph.D. advisor prodded me into my dissertation research project, Bootstrapping Medical Research Contingency Tables, I had very little knowledge about the topic. Simply stated, bootstrapping is a statistical technique that involves resampling with replacement from data already collected. My first reaction prior to reading a paper by American statistician, Henry Scheffé, was disbelief! If it really worked, it had to be due to magic, not mathematics! Over the years I have seen lots of magic performed and sometimes it seemed as though my math profs had studied with Houdini.
Speaking of which, the following video was sent to me by a friend and the timing was perfect. Here’s a scene from the Tonight Show when Johnny Carson was the host. Enjoy!
Since my initial exposure to the statistical bootstrap, I have had enough experience with its application to believe it isn’t really like pulling estimates out of a hat… or anywhere else!
Bootstrapping is a highly effective statistical technique that allows for better confidence intervals around estimates. Alright, what does that mean. Let’s say that you have collected data on mechanical failures for machines in your shop. You’ve collected the data in order to develop a new maintenance policy which maximizes operational availability at minimum cost. It is reasonable to assume your data set will be pretty meager given you are tracking failures, at least I would hope that to be the case! It is likely that you will have an estimate of the mean time between failures (MBTF) with a relatively large variance around that mean. Given the large standard deviation(the square root of the variance), the confidence interval wherein the true MBTF lies will also be pretty wide and therefore not all that helpful in reducing the cost associated with a changed maintenance plan. With the appropriate use of bootstrapping, your data will normally provide a reduced standard deviation and allow for a significantly better confidence interval.
So what does that really translate to in English? Using the bootstrap will allow you to devise a more efficient and cost effective maintenance schedule by reducing the risk of implementing a plan based upon large variance in the MBTF estimate.
Of course, the range of applications to any enterprise – government, commercial or non-profit – is not limited to this example. The range of appropriate applications is very large indeed and something to consider when your dataset is small.
Bootstrapping is generally worth considering:
- When you don’t know the underlying distribution of your data or of the specific statistic you would like to estimate. Bootstrapping is independent of the distribution and can provide more insight into the parameters of interest.
- When the data sample size is very small.
- When you suspect that there are data anomalies even when you have real confidence that the underlying distribution is well-known, (for example, the negative exponential distribution of electronic failure rates).
- When planning a larger data collection effort and you want to design the experiment to minimize the cost of data collection, you can use bootstrapping to get better statistical estimates from a small pilot test data sample.
- Related to the previous bullet, when you need to determine the sample size of a data collection effort, the usual methodology involves estimating the standard deviation for the key statistic. Bootstrapping a small test sample will help validate assumptions about the distribution.
If your organization is dealing with any of the above examples, then consult us, any other Operations Research Professional or a qualified statistician. We can help improve your bottom line and the result might just seem like magic!