官术网_书友最值得收藏!

Outliers

The simplest explanation for what outliers are might be is to say that outliers are those data points that just don't fit the rest of your data. Upon observance, any data that is either very high, very low, or just unusual (within the context of your project), is an outlier. As part of data cleansing, a data scientist would typically identify the outliers and then address the outliers using a generally accepted method:

  • Delete the outlier values or even the actual variable where the outliers exist
  • Transform the values or the variable itself

Let's look at a real-world example of using R to identify and then address data outliers.

In the world of gaming, slot machines (a gambling machine operated by inserting coins into a slot and pulling a handle which determines the payoff) are quite popular. Most slot machines today are electronic and therefore are programmed in such a way that all their activities are continuously tracked. In our example, investors in a casino want to use this data (as well as various supplementary data) to drive adjustments to their profitability strategy. In other words, what makes for a profitable slot machine? Is it the machine's theme or its type? Are newer machines more profitable than older or retro machines? What about the physical location of the machine? Are lower denomination machines more profitable? We try to find our answers using the outliers.

We are given a collection or pool of gaming data (formatted as a comma-delimited or CSV text file), which includes data points such as the location of the slot machine, its denomination, month, day, year, machine type, age of the machine, promotions, coupons, weather, and coin-in (which is the total amount inserted into the machine less pay-outs). The first step for us as a data scientist is to review (sometimes called profile) the data, where we'll determine if any outliers exist. The second step will be to address those outliers.

主站蜘蛛池模板: 温泉县| 锦州市| 浠水县| 武穴市| 新河县| 年辖:市辖区| 通许县| 关岭| 永嘉县| 鄂托克前旗| 清新县| 双柏县| 肃南| 宣城市| 莱芜市| 白河县| 泾阳县| 康平县| 南康市| 登封市| 南部县| 大余县| 左权县| 沛县| 浑源县| 吉林省| 濉溪县| 上饶县| 竹北市| 曲水县| 孟津县| 江城| 资兴市| 塔河县| 疏勒县| 江口县| 彭水| 金溪县| 江津市| 安溪县| 石林|