- Learning Quantitative Finance with R
- Dr. Param Jeet Prashant Vats
- 238字
- 2021-07-09 19:06:53
Outlier detection
Outliers are very important to be taken into consideration for any analysis as they can make analysis biased. There are various ways to detect outliers in R and the most common one will be discussed in this section.
Boxplot
Let us construct a boxplot
for the variable volume of the Sampledata
, which can be done by executing the following code:
> boxplot(Sampledata$Volume, main="Volume", boxwex=0.1)
The graph is as follows:
Figure 2.16: Boxplot for outlier detection
An outlier is an observation which is distant from the rest of the data. When reviewing the preceding boxplot, we can clearly see the outliers which are located outside the fences (whiskers) of the boxplot.
LOF algorithm
The local outlier factor (LOF) is used for identifying density-based local outliers. In LOF, the local density of a point is compared with that of its neighbors. If the point is in a sparser region than its neighbors then it is treated as an outlier. Let us consider some of the variables from the Sampledata
and execute the following code:
> library(DMwR) > Sampledata1<- Sampledata[,2:4] > outlier.scores <- lofactor(Sampledata1, k=4) > plot(density(outlier.scores))
Here, k
is the number of neighbors used in the calculation of the local outlier factors.
The graph is as follows:
Figure 2.17: Plot showing outliers by LOF method
If you want the top five outliers then execute the following code:
> order(outlier.scores, decreasing=T)[1:5]
This gives an output with the row numbers:
[1] 50 34 40 33 22
- 基于C語言的程序設計
- 輕輕松松自動化測試
- AWS:Security Best Practices on AWS
- 一本書玩轉數據分析(雙色圖解版)
- Mastering Salesforce CRM Administration
- 統計策略搜索強化學習方法及應用
- CompTIA Linux+ Certification Guide
- 大數據驅動的機械裝備智能運維理論及應用
- Applied Data Visualization with R and ggplot2
- 統計挖掘與機器學習:大數據預測建模和分析技術(原書第3版)
- 單片機技術項目化原理與實訓
- Natural Language Processing and Computational Linguistics
- Deep Learning Essentials
- Instant Slic3r
- 計算機硬件技術基礎學習指導與練習