官术网_书友最值得收藏!

Sampling with probability weights

To create the nuclear bomb during the Second World War, physicists needed to perform pretty complicated calculations. Stanislaw Ulam got the idea to treat this challenge as a game of chance. Later, the method he came up with was given the code name Monte Carlo. Games of chance usually have very simple rules, but playing in an optimal way can be difficult. According to quantum mechanics, subatomic particles are also unpredictable. If we simulate many experiments with subatomic particles, we still can get an idea of how they are likely to behave. The Monte Carlo method is not deterministic, but it approaches the correct result for a complex computation for a sufficiently large number of simulations.

The statsmodels.distributions.empirical_distribution.ECDF class defines the cumulative distribution function of a data array. We can use its output to simulate a complex process. This simulation is not perfect, because we lose information in the process.

How to do it...

In this recipe, we will simulate weather processes. In particular, I am interested in annual temperature values. I am interested in finding out whether the simulated data sets also show an upward trend:

  1. The imports are as follows:
    from statsmodels.distributions.empirical_distribution import ECDF
    import dautil as dl
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.utils import check_random_state
    from IPython.html.widgets.interaction import interact
    from IPython.core.display import HTML
  2. Define the following function to calculate the slope:
    def slope(x, y):
        return np.polyfit(x, y, 1)[0]
  3. Define the following function to generate data for a single year:
    def simulate(x, years, rs, p):
        N = len(years)
        means = np.zeros(N)
    
        for i in range(N):
            sample = rs.choice(x, size=365, p=p)
            means[i] = sample.mean()
    
        return means, np.diff(means).mean(), slope(years, means)
  4. Define the following function to run multiple simulations:
    def run_multiple(times, x, years, p):
        sims = []
        rs = check_random_state(20)
    
        for i in range(times):
            sims.append(simulate(x, years, rs, p))
    
        return np.array(sims)
  5. Define the following function, which by default loads temperature values:
    def main(var='TEMP'):
        df = dl.data.Weather.load().dropna()[var]
        cdf = ECDF(df)
        x = cdf.x[1:]
        p = np.diff(cdf.y)
    
        df = df.resample('A')
        years = df.index.year
        sims = run_multiple(500, x, years, p)
    
        sp = dl.plotting.Subplotter(2, 1, context)
        plotter = dl.plotting.CyclePlotter(sp.ax)
        plotter.plot(years, df.values, label='Data')
        plotter.plot(years, sims[0][0], label='Sim 1')
        plotter.plot(years, sims[1][0], label='Sim 2')
        header = dl.data.Weather.get_header(var)
        sp.label(title_params=header, ylabel_params=header)
        sp.ax.legend(loc='best')
    
        sp.next_ax()
        sp.label()
        sp.ax.hist(sims.T[2], normed=True)
        plt.figtext(0.2, 0.3, 'Slope of the Data {:.3f}'.format(slope(years, df.values)))
        plt.tight_layout()

The notebook stored in the sampling_weights.ipynb file in this book's code bundle gives you the option to select other weather variables too. Refer to the following screenshot for the end result:

See also

主站蜘蛛池模板: 菏泽市| 靖江市| 焦作市| 准格尔旗| 昆明市| 隆尧县| 灵丘县| 宝清县| 伽师县| 田林县| 自治县| 靖宇县| 三明市| 洛阳市| 酉阳| 赤水市| 博白县| 潍坊市| 玉田县| 托里县| 融水| 南投市| 安吉县| 堆龙德庆县| 镇赉县| 津南区| 车险| 襄垣县| 榆社县| 张北县| 鞍山市| 新和县| 叙永县| 怀柔区| 海伦市| 犍为县| 平定县| 扎囊县| 长宁县| 黔南| 德清县|