官术网_书友最值得收藏!

Be careful

Previously, in the Running t-tests with R section, we used the bigger sample to run a t-test; but what about the smaller sample? Speaking of R, yes, we can run it. The next code block shows how, but you shouldn't trust a sample so small:

t.test(small_sample, mu = 10, alternative = 'two.sided')

# One Sample t-test
#
# data: small_sample
# t = -2.2169, df = 9, p-value = 0.05384
# alternative hypothesis: true mean is not equal to 10
# 95 percent confidence interval:
# 5.043347 10.050084
# sample estimates:
# mean of x
# 7.546716

The test now came close to rejecting the null hypothesis with a 95% confidence level (and, as we already know, the sample does come from a normal distribution with a mean, μ, equal to 10). This test is not all mighty powerful, I can assure you, but it's more trustworthy if you have lots of observations. Actually, you should be very cautious about making any kind of statistical inference using little data.

You can also test if two samples have the same mean (μ) using the t-test, using t.test().  To do so, name the x and y parameters and at least set var.equal = T to t.test(). The latter will make sure that the variance is considered the same for both samples. This equal variance thing is a necessary assumption for the simple two samples t-test, otherwise you're committing yourself to a Welch's t-test (feel free to do it if you will, it's probably better to a great variety of situations). There is also the possibility to set a custom confidence level by declaring the conf.level argument and to use a different alternative hypothesis with the alternative argument. A quick example can be found  as follows:

t.test(x = small_sample, y = big_sample, var.equal = T)

Let me stress that you shouldn't run it with a sample as small as small_sample. Of course, R will run it, but nonetheless, speaking about the statistical point of view, this is very poor inference because it is based on very poor evidence (a small sample). So, keep in mind, these tests are assuming your data is coming from a normal distribution of unknown standard deviation, and really small samples could be problematic.

One could try something like plot(density(<variable>)) to check whether a variable resembles a normal distribution or not.

But what if you do know the populations' variance?

主站蜘蛛池模板: 禹城市| 武城县| 怀化市| 进贤县| 威信县| 岳普湖县| 封丘县| 虞城县| 双柏县| 敦煌市| 南汇区| 九江县| 阿坝| 宁国市| 大余县| 衡南县| 武隆县| 乐山市| 洪雅县| 上高县| 库车县| 仁寿县| 高唐县| 新宾| 潮安县| 桐柏县| 东城区| 南康市| 探索| 麻城市| 鸡泽县| 楚雄市| 广平县| 明光市| 临武县| 沧州市| 济宁市| 堆龙德庆县| 寿宁县| 水城县| 玛沁县|