- The Data Science Workshop
- Anthony So Thomas V. Joseph Robert Thas John Andrew Worsley Dr. Samuel Asare
- 314字
- 2021-06-11 18:27:23
Data-Driven Feature Engineering
The previous section dealt with business-driven feature engineering. In addition to features we can derive from the business perspective, it would also be imperative to transform data through feature engineering from the perspective of data structures. We will look into different methods of identifying data structures and take a peek into some data transformation techniques.
A Quick Peek at Data Types and a Descriptive Summary
Looking at the data types such as categorical or numeric and then deriving summary statistics is a good way to take a quick peek into data before you do some of the downstream feature engineering steps. Let's take a look at an example from our dataset:
# Looking at Data types
print(bankData.dtypes)
# Looking at descriptive statistics
print(bankData.describe())
You should get the following output:

Figure 3.28: Output showing the different data types in the dataset
In the preceding output, you see the different types of information in the dataset and its corresponding data types. For instance, age is an integer and so is day.
The following output is that of a descriptive summary statistic, which displays some of the basic measures such as mean, standard deviation, count, and the quantile values of the respective features:

Figure 3.29: Data types and a descriptive summary
The purpose of a descriptive summary is to get a quick feel of the data with respect to the distribution and some basic statistics such as mean and standard deviation. Getting a perspective on the summary statistics is critical for thinking about what kind of transformations are required for each variable.
For instance, in the earlier exercises, we converted the numerical data into categorical variables based on the quantile values. Intuitions for transforming variables would come from the quick summary statistics that we can derive from the dataset.
In the following sections, we will be looking at the correlation matrix and visualization.
- 極簡算法史:從數學到機器的故事
- 零基礎玩轉區塊鏈
- Hands-On C++ Game Animation Programming
- 深入淺出RxJS
- Scala Data Analysis Cookbook
- JavaScript腳本特效編程給力起飛
- Hands-On Nuxt.js Web Development
- BeagleBone Robotic Projects(Second Edition)
- 從零開始學UI:概念解析、實戰提高、突破規則
- 深入理解Java虛擬機:JVM高級特性與最佳實踐
- Real-time Analytics with Storm and Cassandra
- SQL Server 2008數據庫應用技術(第2版)
- Delphi Cookbook
- 高性能Java架構:核心原理與案例實戰
- HTML5從入門到精通(第3版)