官术网_书友最值得收藏!

Calculating mean, median, and mode with base R

Altogether, the mean, median, and mode are the most popular measures of central tendency. They kind of tell us where the distribution is centered. The following code block shows how to calculate the first two of them:

mean(small_sample, na.rm = T)
# outputs [1] 7.546716
mean(big_sample, na.rm = T)
# outputs [1] 9.97051
median(small_sample, na.rm = T)
# outputs [1] 8.449614
median(big_sample, na.rm = T)
# outputs [1] 9.979968
To keep it simple, the arithmetic mean is the sum of all values divided by the number of observations.  Median is the middle observation (center) of a sorted sample and mode is the value (or values) that are most frequent in the dataset (if there is one).

The mean() and median() functions respectively return the mean and median from a set of numbers. If you have any NA at your set and you still want to compute the mean/median no matter what, the na.rm = T argument will prevent your function from crashing. This argument will demand the function remove NAs before handling the computation.

Skip the na.rm = T argument if your data is not supposed to have any NAs. A warning will be displayed if any NA is found and you will notice that something may have gone wrong.

Given that the sample comes from continuous data, even with 100,000 observations, it's very unlikely for a single value to show up more than once. One or more modes are much more likely to show up if we looked into rounded samples. Base R does not have a fully dedicated function to calculate mode but we can easily wrap a function to do so. The next code block shows how to do it:

find_mode <- function(vals) {
if(max(table(vals)) == min(table(vals)))
'amodal'
else
names(table(vals))[table(vals)==max(table(vals))]
}

Modes can be also estimated for non-numeric distributions. A distribution can have no mode if all values can be seen as much as any other in the sample. Those are called amodal (with no mode). Now, we can now supply our recently crafted function (find_mode) with big_sample:

find_mode(big_sample)
# outputs [1] "amodal"
find_mode(round(big_sample))
# outputs [1] "10"

Even for big samples of continuous variables, there are considerable chances of not finding a mode. It's way easier to find one or more modes in a sample of integers. These are not the only central tendency measures available. A package called psych has functions that calculate harmonic and geometric means. The following code block demonstrates how to install psych and draw the calculations:

if(!require(psych)){ install.packages('psych')}
psych::harmonic.mean(big_sample)
# outputs [1] 7.419585
psych::geometric.mean(big_sample)
# outputs [1] 8.793195
# Warning message:
# In log(x) : NaNs produced

Let me break down the preceding code block:

  • if(!require(psych)){ install.packages('psych')} can be read as if the psych package is not installed yet, install it
  • psych::harmonic.mean(big_sample) tells R to calculate the harmonic mean from big_sample using the harmonic.mean() function of psych
  • psych::geometric.mean(big_sample) asks for the geometric.mean() function of psych to calculate the geometric mean from big_sample

It would be most common for R users to load the entire package using either library(psych) or require(psych) and only then calling functions names (without saying from which package they came from).

Using library() or require() to load packages will spare you some typing while making your code cleaner. On the other hand, calling a function by <package name>::<function> will make your code extensive but more explicit about what is being made, while also avoiding possible naming conflicts.

There are far more central tendency measures than those five presented until now. There is no one-size-fits-all kind of measure; different situations will benefit from different measures, but let's move on to next section.

主站蜘蛛池模板: 锦屏县| 昌宁县| 伽师县| 永春县| 佛教| 西宁市| 济阳县| 库尔勒市| 六安市| 吴桥县| 洮南市| 巴中市| 白水县| 榆树市| 东山县| 郴州市| 岑巩县| 桂林市| 卢氏县| 鹿邑县| 来凤县| 遂溪县| 广平县| 平陆县| 嫩江县| 平乡县| 洛隆县| 兴国县| 塔河县| 城口县| 石狮市| 南郑县| 仪征市| 佛山市| 平定县| 吉首市| 怀来县| 南宁市| 陵水| 大丰市| 鹿泉市|