官术网_书友最值得收藏!

Steps in EDA

Having understood what EDA is, and its significance, let's understand the various steps involved in data analysis. Basically, it involves four different steps. Let's go through each of them to get a brief understanding of each step:

Problem definition: Before trying to extract useful insight from the data, it is essential to define the business problem to be solved. The problem definition works as the driving force for a data analysis plan execution. The main tasks involved in problem definition are defining the main objective of the analysis, defining the main deliverables, outlining the main roles and responsibilities, obtaining the current status of the data, defining the timetable, and performing cost/benefit analysis. Based on such a problem definition, an execution plan can be created.

Data preparation: This step involves methods for preparing the dataset before actual analysis. In this step, we define the sources of data, define data schemas and tables, understand the main characteristics of the data, clean the dataset, delete non-relevant datasets, transform the data, and divide the data into required chunks for analysis.

Data analysis: This is one of the most crucial steps that deals with descriptive statistics and analysis of the data. The main tasks involve summarizing the data, finding the hidden correlation and relationships among the data, developing predictive models, evaluating the models, and calculating the accuracies. Some of the techniques used for data summarization are summary tables, graphs, descriptive statistics, inferential statistics, correlation statistics, searching, grouping, and mathematical models.

Development and representation of the results: This step involves presenting the dataset to the target audience in the form of graphs, summary tables, maps, and diagrams. This is also an essential step as the result analyzed from the dataset should be interpretable by the business stakeholders, which is one of the major goals of EDA. Most of the graphical analysis techniques include scattering plots, character plots, histograms, box plots, residual plots, mean plots, and others. We will explore several types of graphical representation in Chapter 2, Visual Aids for EDA.   

主站蜘蛛池模板: 简阳市| 昆明市| 永城市| 渭源县| 石台县| 卢氏县| 柳州市| 江永县| 台山市| 宜都市| 青州市| 弋阳县| 颍上县| 富源县| 贡觉县| 龙岩市| 浦城县| 女性| 喀喇沁旗| 甘孜县| 延寿县| 安阳县| 宣威市| 北京市| 清水河县| 九江市| 济南市| 临湘市| 水城县| 灵寿县| 慈利县| 清涧县| 固阳县| 乳山市| 禹州市| 洛浦县| 彰化市| 衡水市| 灵石县| 乌拉特后旗| 民县|