官术网_书友最值得收藏!

Sampling

When building any model in finance, we may have very large datasets on which model building will be very time-consuming. Once the model is built, if we need to tweak the model again, it is going to be a time-consuming process because of the volume of data. So it is better to get the random or proportionate sample of the population data on which model building will be easier and less time-consuming. So in this section, we are going to discuss how to select a random sample and a stratified sample from the data. This will play a critical role in building the model on sample data drawn from the population data.

Random sampling

Select the sample where all the observation in the population has an equal chance. It can be done in two ways, one without replacement and the other with replacement.

A random sample without replacement can be done by executing the following code:

> RandomSample <- Sampledata[sample(1:nrow(Sampledata), 10,  
>+ replace=FALSE),] 

This generates the following output:

Random sampling

Figure 2.6: Table shows random sample without replacement

A random sample with replacement can be done by executing the following code. Replacement means that an observation can be drawn more than once. So if a particular observation is selected, it is again put into the population and it can be selected again:

> RandomSample <- Sampledata[sample(1:nrow(Sampledata), 10,  
>+ replace=TRUE),] 

This generates the following output:

Random sampling

Figure 2.7: Table showing random sampling with replacement

Stratified sampling

In stratified sampling, we pide the population into separate groups, called strata. Then, a probability sample (often a simple random sample) is drawn from each group. Stratified sampling has several advantages over simple random sampling. With stratified sampling, it is possible to reduce the sample size in order to get better precision.

Now let us see how many groups exist by using Flag and Sentiments as given in the following code:

>library(sampling) 
>table(Sampledata$Flag,Sampledata$Sentiments)

The output is as follows:

Stratified sampling

Figure 2.8: Table showing the frequencies across different groups

Now you can select the sample from the different groups according to your requirement:

>Stratsubset=strata(Sampledata,c("Flag","Sentiments"),size=c(6,5, >+4,3), method="srswor") 
> Stratsubset 

The output is as follows:

Stratified sampling

Figure 2.9: Table showing output for stratified sampling

主站蜘蛛池模板: 成武县| 油尖旺区| 高尔夫| 称多县| 朔州市| 泸定县| 花莲市| 阿克| 锦州市| 姜堰市| 慈溪市| 汉中市| 威远县| 敦化市| 新河县| 长治市| 涿州市| 五指山市| 蓬莱市| 阳新县| 松江区| 潜江市| 拉孜县| 抚州市| 额济纳旗| 临沭县| 当涂县| 运城市| 海门市| 满城县| 永仁县| 绥江县| 琼海市| 琼结县| 白城市| 琼结县| 龙陵县| 谷城县| 永定县| 广安市| 丽水市|