官术网_书友最值得收藏!

Determining bias

When teaching probability, it is customary to give examples of coin tosses. Whether it is going to rain or not is more or less like a coin toss. If we have two possible outcomes, the binomial distribution is appropriate. This distribution requires two parameters: the probability and the sample size.

In statistics, there are two generally accepted approaches. In the frequentist approach, we measure the number of coin tosses and use that frequency for further analysis. Bayesian analysis is named after its founder the Reverend Thomas Bayes. The Bayesian approach is more incremental and requires a prior distribution, which is the distribution we assume before performing experiments. The posterior distribution is the distribution we are interested in and which we obtain after getting new data from experiments. Let's first have a look at the following equations:

(3.7) and (3.8) describe the probability mass function for the binomial distribution. (3.9) comes from an essay published by Bayes. The equation is about an experiment with m successes and n failures and assumes a uniform prior distribution for the probability parameter of the binomial distribution.

How to do it...

In this recipe, we will apply the frequentist and Bayesian approach to rain data:

  1. The imports are as follows:
    import dautil as dl
    from scipy import stats
    import matplotlib.pyplot as plt
    import numpy as np
    from IPython.html.widgets.interaction import interact
    from IPython.display import HTML
  2. Define the following function to load the data:
    def load():
        rainy = dl.data.Weather.rain_values() > 0
        n = len(rainy)
        nrains = np.cumsum(rainy)
    
        return n, nrains
  3. Define the following function to compute the posterior:
    def posterior(i, u, data):
        return stats.binom(i, u).pmf(data[i])
  4. Define the following function to plot the posterior for the subset of the data:
    def plot_posterior(ax, day, u, nrains):
        ax.set_title('Posterior distribution for day {}'.format(day))
        ax.plot(posterior(day, u, nrains),
                label='rainy days in period={}'.format(nrains[day]))
        ax.set_xlabel('Uniform prior parameter')
        ax.set_ylabel('Probability rain')
        ax.legend(loc='best')
  5. Define the following function to do the plotting:
    def plot(day1=1, day2=30):
        fig, [[upleft, upright], [downleft, downright]] = plt.subplots(2, 2)
        plt.suptitle('Determining bias of rain data')
        x = np.arange(n) + 1
        upleft.set_title('Frequentist Approach')
        upleft.plot(x, nrains/x, label='Probability rain')
        upleft.set_xlabel('Days')
        set_ylabel(upleft)
    
        max_p = np.zeros(n)
        u = np.linspace(0, 1, 100)
    
        for i in x - 1:
            max_p[i] = posterior(i, u, nrains).argmax()/100
    
        downleft.set_title('Bayesian Approach')
        downleft.plot(x, max_p)
        downleft.set_xlabel('Days')
        set_ylabel(downleft)
    
        plot_posterior(upright, day1, u, nrains)
        plot_posterior(downright, day2, u, nrains)
        plt.tight_layout()
  6. The following lines call the other functions and place a watermark:
    interact(plot, day1=(1, n), day2=(1, n))
    HTML(dl.report.HTMLBuilder().watermark())

Refer to the following screenshot for the end result (see the determining_bias.ipynb file in this book's code bundle):

See also

主站蜘蛛池模板: 江油市| 沭阳县| 南漳县| 邢台县| 五大连池市| 芮城县| 通榆县| 定州市| 黎城县| 平潭县| 瓮安县| 万载县| 湖州市| 丽水市| 芒康县| 偏关县| 苍山县| 淮滨县| 玉树县| 荥阳市| 鄂温| 介休市| 石台县| 茂名市| 海淀区| 忻州市| 平定县| 郑州市| 铜山县| 扎鲁特旗| 赤水市| 澳门| 东源县| 孟村| 华池县| 万荣县| 道孚县| 南华县| 漳州市| 双江| 化德县|