电玩城奔驰宝马老虎机

書名： Simulation for Data Science with R
作者名： Matthias Templ
本章字數： 393字
更新時間： 2021-07-14 11:17:05

Why use simulation?

Simulation can save huge amounts of time and provides very accurate answers to our questions.

Statistical inference is often handled by asymptotic normal theory, which may provide formulas for the standard errors that allow us to construct confidence intervals around point estimates. For the simple case of the simple estimator of the arithmetic mean, we can immediately choose the formula Why use simulation? for an observational vector x with n values, the arithmetic and s being the standard deviation of x. However, this formula to express the confidence interval for the arithmetic mean is only true for independent identical distributed samples, sampled with simple random sampling from a population. However, in many situations the (asymptotic) distribution of the parameter of interest might not be known, and often we do not have the expertise to derive even an approximation of a formula to express the standard error of an estimator of interest. For example, this might be true for the Huber mean (Huber 1981) from data sampled with a multi-stage cluster sampling design. In other words, if the quantity of interest is a very complex function of the data or if the data is of a very complex nature, we may be able to benefit substantially from the use of a Monte Carlo simulation. Even when a formula may exist in the statistical literature to express the confidence interval, we might not be aware of it.

A very prominent resampling method is the bootstrap, intensively discussed in Chapter 7, Resampling Methods. In this approach, the sampling distribution of the parameter estimate is simulated by repeated sampling with replacement from the current data, and re-computing parameter estimates from each sampled data set. The distribution of these estimations expresses the variability of the estimation, thus this distribution can be used to express confidence intervals.

The approach is very similar for hypothesis tests. The distribution of the test statistics is not always known for a test. With the Monte Carlo approach to testing, data is simulated in a way that it mimics the null hypothesis, and parameters for data generation are used from the empirical data. The test statistic is calculated on the data and compared to the repeatedly simulated data. It's then a straightforward topic in Chapter 8, Applications of Resampling Methods and Monte Carlo Tests, Monte Carlo Tests) to receive a p-value for the test.

官术网_书友最值得收藏!

Simulation for Data Science with R

Why use simulation?