官术网_书友最值得收藏!

Descriptive analysis

The first problem to solve in almost any data science scenario concerns understanding its nature. We need to know how the system works or what a dataset is describing. Without this analysis, our knowledge is too limited to make any assumption or hypothesis. For example, we can observe a chart of the average temperature in a city for several years. If we are unable to describe the time series discovering the correlation, seasonalities, and trends, any other question remains unsolved. In our specific context, if we don't discover the similarities between groups of objects, we cannot try to find out a way to summarize their common features. The data scientist has to employ specific tools for every particular problem, but, at the end of this stage, all possible (and helpful) questions must be answered.

Moreover, as this process must have clear business value, it's important to involve different stakeholders with the purpose of gathering their knowledge and converting it into a common language. For example, when working with healthcare data, a physician might talk about hereditary factors, but for our purpose, it's preferable to say that there's a correlation among some samples, so we're not fully authorized to treat them as statistically independent elements. In general, the outcome of descriptive analysis is a summary containing all metric evaluations and conclusions that are necessary to qualify the context, and reducing uncertainty. In the example of the temperature chart, the data scientist should be able to answer the auto-correlation, the periodicity of the peaks, the number of potential outliers, and the presence of trends.

主站蜘蛛池模板: 甘南县| 永丰县| 裕民县| 嘉禾县| 视频| 突泉县| 西峡县| 措勤县| 舒兰市| 铜山县| 夹江县| 元谋县| 邻水| 罗定市| 靖西县| 乐都县| 洛南县| 嘉祥县| 徐闻县| 都江堰市| 会理县| 东莞市| 和林格尔县| 鹤峰县| 广灵县| 宣汉县| 彭州市| 肥乡县| 偏关县| 白河县| 汶川县| 广平县| 岑溪市| 泸溪县| 洛川县| 临清市| 元氏县| 翁源县| 咸阳市| 寻乌县| 武宣县|