官术网_书友最值得收藏!

Exploratory Data Analysis Fundamentals

The main objective of this introductory chapter is to revise the fundamentals of Exploratory Data Analysis (EDA), what it is, the key concepts of profiling and quality assessment, the main dimensions of EDA, and the main challenges and opportunities in EDA.  

Data encompasses a collection of discrete objects, numbers, words, events, facts, measurements, observations, or even descriptions of things. Such data is collected and stored by every event or process occurring in several disciplines, including biology, economics, engineering, marketing, and others. Processing such data elicits useful information and processing such information generates useful knowledge. But an important question is: how can we generate meaningful and useful information from such data? An answer to this question is EDA. EDA is a process of examining the available dataset to discover patterns, spot anomalies, test hypotheses, and check assumptions using statistical measures. In this chapter, we are going to discuss the steps involved in performing top-notch exploratory data analysis and get our hands dirty using some open source databases.

As mentioned here and in several studies, the primary aim of EDA is to examine what data can tell us before actually going through formal modeling or hypothesis formulation. John Tuckey promoted EDA to statisticians to examine and discover the data and create newer hypotheses that could be used for the development of a newer approach in data collection and experimentations. 

In this chapter, we are going to learn and revise the following topics:

Understanding data science

The significance of EDA

Making sense of data

Comparing EDA with classical and Bayesian analysis

Software tools available for EDA

Getting started with EDA

主站蜘蛛池模板: 龙南县| 广河县| 开原市| 安图县| 涿鹿县| 安国市| 道真| 通山县| 红桥区| 兰坪| 哈尔滨市| 青河县| 闽清县| 通辽市| 长垣县| 青浦区| 晋宁县| 金湖县| 府谷县| 秦皇岛市| 临海市| 南城县| 宿州市| 依安县| 兴安盟| 资兴市| 井研县| 澄江县| 苍溪县| 斗六市| 高邮市| 武清区| 平和县| 玉林市| 子洲县| 乌苏市| 绩溪县| 南岸区| 泸定县| 通渭县| 白银市|