- Applied Supervised Learning with R
- Karthik Ramasubramanian Jojo Moolayil
- 398字
- 2021-06-11 13:22:30
Introduction
Chapter 1, R for Advanced Analytics, introduced to you the R language and its ecosystem for data science. We are now ready to enter a crucial part of data science and machine learning, that is, Exploratory Data Analysis (EDA), the art of understanding the data.
In this chapter, we will approach EDA with the same banking dataset used in the previous chapter, but in a more problem-centric way. We will start by defining the problem statement with industry standard artifacts, design a solution for the problem, and learn how EDA fits in the larger problem framework. We will then tackle the EDA for the direct marketing campaigns (phone calls) of a Portuguese banking institution use case using a combination of data engineering, data wrangling, and data visualization techniques in R, backed up by a business-centric approach.
In any data science use case, understanding the data consumes the bulk of the time and effort. Most data science professionals spend around 80% of their time understanding data. Given that this is the most crucial part of your journey, it is important to have a macro-view of the overall process for any data science use case.
A typical data science use case takes the path of a core business-analytics problem or a machine-learning problem. With either path approached, EDA is inevitable. Figure 2.1 demonstrates the life cycle of a basic data science use case. It starts by defining the problem statement using one or more standard frameworks, and then it delves into data gathering and reaches EDA. The majority of efforts and time in any project is consumed in EDA. Once the process of understanding the data is complete, a project may take a different path based on the scope of the use case. In most business analytics-based use cases, the next step is to assimilate all the observed patterns into meaningful insights. Though this might sound trivial, it is an iterative and arduous task. This step then evolves into story-telling, where the condensed insights are tailored into a meaningful story for the business stakeholders. Similarly, in scenarios where the objective is to develop a predictive model, the next step would be to actually develop a machine learning model and then deploy it into a production system/product.

Figure 2.1: Life cycle of a data science use case
Let's take a brief look at the first step, Defining the Problem Statement.
- 24小時學會電腦組裝與維護
- 筆記本電腦使用、維護與故障排除實戰
- Raspberry Pi 3 Cookbook for Python Programmers
- 圖解西門子S7-200系列PLC入門
- Windows phone 7.5 application development with F#
- Augmented Reality with Kinect
- 電腦常見問題與故障排除
- 計算機組裝與系統配置
- 硬件產品經理成長手記(全彩)
- 分布式系統與一致性
- 單片機技術及應用
- RISC-V處理器與片上系統設計:基于FPGA與云平臺的實驗教程
- 數字媒體專業英語(第2版)
- 分布式存儲系統:核心技術、系統實現與Go項目實戰
- 筆記本電腦的結構、原理與維修