官术网_书友最值得收藏!

Introduction

In the previous chapter, we looked at some of the main techniques that are used in data analysis. We saw how hypothesis testing can be used when analyzing data, we got a brief introduction to visualizations, and finally, we explored some concepts related to time series analysis. In this chapter, we will elaborate on some of the topics we've already looked at (such as plotting and hypothesis testing) while introducing new ones coming from probability theory and data transformations.

Nowadays, work relationships are becoming more and more trust-oriented, and conservative contracts (in which working time is strictly monitored) are being replaced with more agile ones in which the employee themselves is responsible for accounting working time. This liberty may lead to unregulated absenteeism and may reflect poorly on an employee's candidature, even if absent hours can be accounted for with genuine reasons. This can significantly undermine healthy working relationships. Furthermore, unregulated absenteeism can also have a negative impact on work productivity.

In this chapter, we'll analyze absenteeism data from a Brazilian courier company, collected between July 2007 and July 2010.

Note

The original dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work.

If you're interested, take a look at the following paper, which talks about the problem from a machine learning perspective: Martiniano, A., Ferreira, R.P., Sassi, R.J., & Affonso, C. (2012). Application of neuro fuzz network on prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.

This dataset can also be found on our GitHub repository here: https://packt.live/3e4rorX.

Our goal is to discover hidden patterns in the data, which might be useful for distinguishing genuine work absences from fraudulent ones. During this chapter, the following topics will be addressed:

  • Introduction to probability, conditional probability, and Bayes' theorem
  • Kolmogorov-Smirnov tests for equality of probability distributions
  • Box-Cox and Yeo-Johnson transformations

We will apply these techniques to our analysis as we try to identify the main drivers for absenteeism.

主站蜘蛛池模板: 古田县| 九寨沟县| 孟津县| 太仓市| 新建县| 徐水县| 陇西县| 霍州市| 噶尔县| 泽普县| 读书| 磴口县| 宜兰县| 景德镇市| 仙游县| 湟中县| 伊通| 斗六市| 区。| 临邑县| 唐海县| 新邵县| 乌鲁木齐县| 右玉县| 响水县| 古田县| 乌拉特前旗| 常州市| 万山特区| 新兴县| 曲松县| 德清县| 湘潭县| 宜川县| 行唐县| 社旗县| 剑河县| 澄城县| 井陉县| 双流县| 桃江县|