- Simulation for Data Science with R
- Matthias Templ
- 393字
- 2021-07-14 11:17:05
Why use simulation?
Simulation can save huge amounts of time and provides very accurate answers to our questions.
Statistical inference is often handled by asymptotic normal theory, which may provide formulas for the standard errors that allow us to construct confidence intervals around point estimates. For the simple case of the simple estimator of the arithmetic mean, we can immediately choose the formula for an observational vector x with n values, the arithmetic
and s being the standard deviation of x. However, this formula to express the confidence interval for the arithmetic mean is only true for independent identical distributed samples, sampled with simple random sampling from a population. However, in many situations the (asymptotic) distribution of the parameter of interest might not be known, and often we do not have the expertise to derive even an approximation of a formula to express the standard error of an estimator of interest. For example, this might be true for the Huber mean (Huber 1981) from data sampled with a multi-stage cluster sampling design. In other words, if the quantity of interest is a very complex function of the data or if the data is of a very complex nature, we may be able to benefit substantially from the use of a Monte Carlo simulation. Even when a formula may exist in the statistical literature to express the confidence interval, we might not be aware of it.
A very prominent resampling method is the bootstrap, intensively discussed in Chapter 7, Resampling Methods. In this approach, the sampling distribution of the parameter estimate is simulated by repeated sampling with replacement from the current data, and re-computing parameter estimates from each sampled data set. The distribution of these estimations expresses the variability of the estimation, thus this distribution can be used to express confidence intervals.
The approach is very similar for hypothesis tests. The distribution of the test statistics is not always known for a test. With the Monte Carlo approach to testing, data is simulated in a way that it mimics the null hypothesis, and parameters for data generation are used from the empirical data. The test statistic is calculated on the data and compared to the repeatedly simulated data. It's then a straightforward topic in Chapter 8, Applications of Resampling Methods and Monte Carlo Tests, Monte Carlo Tests) to receive a p-value for the test.
- Flask Web全棧開發實戰
- 基于粒計算模型的圖像處理
- 計算思維與算法入門
- Python自動化運維快速入門
- Bootstrap Essentials
- Backbone.js Blueprints
- PhoneGap:Beginner's Guide(Third Edition)
- Python深度學習:模型、方法與實現
- Python High Performance Programming
- Test-Driven Development with Django
- Java程序員面試筆試寶典(第2版)
- Java程序設計案例教程
- 基于SpringBoot實現:Java分布式中間件開發入門與實戰
- Developing SSRS Reports for Dynamics AX
- Red Hat Enterprise Linux Troubleshooting Guide