- Applied Supervised Learning with R
- Karthik Ramasubramanian Jojo Moolayil
- 349字
- 2021-06-11 13:22:31
Understanding the Science Behind EDA
In layman's terms, we can define EDA as the science of understanding data. A more formal definition is the process of analyzing and exploring datasets to summarize its characteristics, properties, and latent relationships using statistical, visual, analytical, or a combination of techniques.
To cement our understanding, let's break down the definition further. The dataset is a combination of numeric and categorical features. To study the data, we might need to explore features individually, and to study relationships, we might need to explore features together. Depending on the number of features and the type of features, we may cross paths with different types of EDA.
To simplify, we can broadly classify the process of EDA as follows:
- Univariate analysis: Studying a single feature
- Bivariate analysis: Studying the relationship between two features
- Multivariate analysis: Studying the relationship between more than two features
For now, we will restrict the scope of the chapter to univariate and bivariate analysis. A few forms of multivariate analysis, such as regression, will be covered in the upcoming chapters.
To accomplish each of the previously mentioned analyses, we can use visualization techniques such as boxplots, scatter plots, and bar charts; statistical techniques such as hypothesis testing; or simple analytical techniques such as averages, frequency counts, and so on.
Breaking this further down, we have another dimension to cater to, that is, the types of features—numeric or categorical. In each of the type of analysis mentioned—univariate and bivariate—based on the type of the feature, we might have a different visual technique to accomplish the study. So, for univariate analysis of a numeric variable, we could use a histogram or a boxplot, whereas we might use a frequency bar chart for a categorical variable. We will get into the details of the overall exercise of EDA using a lazy programming approach, that is, we will explore the context and details of the analysis as and when it occurs in the book.
With the basic background context set for the exercise, let's get ready for a specific EDA exercise.
- Raspberry Pi 3 Cookbook for Python Programmers
- Learning AngularJS Animations
- FPGA從入門到精通(實戰篇)
- 計算機組裝與系統配置
- 從零開始學51單片機C語言
- Learning Stencyl 3.x Game Development Beginner's Guide
- 嵌入式系統中的模擬電路設計
- Practical Machine Learning with R
- STM32嵌入式技術應用開發全案例實踐
- 電腦高級維修及故障排除實戰
- 龍芯自主可信計算及應用
- “硬”核:硬件產品成功密碼
- 單片機技術及應用
- WebGL Hotshot
- FreeSWITCH Cookbook